Analysis of Economic Indicators Effects on the Sectors in the S&P500¶

Darius Kavarana & Ziad Bandak¶

Table of Contents:¶

  1. Introduction

  2. Data Collection and Data Processing

  3. Exploratory Analysis and Data Visualization

  4. Analysis, Hypothesis Testing, and Machine Learning

  5. Interpretation: Insight & Policy Decision

Introduction¶

The United States stock market is one of the oldest, largest, and most lucrative markets in the world currently. The overall size of the US market is $25.564 trillion, hosting over 2800 publicly traded companies (which are companies that have already gone through an IPO or Initial Public Offering event). More about IPO’s and how they work with the market can be found here. Because of the volume which the US market has in trades per day, we use multiple index tractors to increase the expected return of a portfolio while minimizing the overall risk of trading in the market. The US market poses multiple index tractors, however, the main 3 are Dow Jones, NASDAQ, and S&P 500. These tickers are given more numerical information (data) by these companies which provide a more comprehensive market view which is weighted by market capitalization. Additionally, we have the NYSE ticker which shows the overall process, growth, and whether the New York Stock Exchange is up or down. We focus on the New York Stock Exchange since the entire US market is based there on Wall Street.

Our analysis focuses on the Standard and Poor's 500 (S&P 500), which is a stock market index, tracking the stock performance of 500 of the largest companies listed on stock exchanges in the United States. These stocks usually consist of blue chip stocks which will almost always have upside potential for investors (examples of companies under the S&P 500: Apple–AAPL, Amazon–AMZN, Chipotle–CMG, etc…). We chose the S&P 500 because of the different types of stocks listed within the index. These stocks operate in multiple different sectors, and a majority have been around for a while providing us with ample data to work with. Additionally, these are some of the most looked at stocks by the general public, large investors (such as Warren Buffet), and larger banks and hedge funds (e.g. JPMC). An article talking about the S&P 500 can be found here.

With modern technology, we as consumers are able to look at data on our own and see which categories of the market would go up. There are plenty of factors which can affect a market's outcome such as economic factors, sudden events such as a pandemic, or even just shifts from word of mouth creating large trade volumes. However, some of these have only short term effects on the market growth which is why looking at economic indicators is one of the ways we can track previous growth of the market (or different sectors in the market) and use it as a way of predicting future outcomes. Major economic indicators which trigger long term shifts in the market include GDP, Unemployment Rates, Inflation, Interest Rates, and plenty more. The market can be thought of as something which is affected by multiple factors, however some affect it more than others, and sometimes data for these indicators isn’t as easily available. This is why we focused on the GDP and Unemployment Rates. Using these indicators for predicting future economic and market conditions is important because it allows the general public to buy and sell, but also gives us a relative idea of if we need to make a shift in our economy early. More about economic indicators can be found here.

Of course, predictions can be affected by the inaccuracy of a model, or old data, or sudden large scale events which trigger long term market shifts, however, we can still use these as a baseline to plan for our futures as normal people investing and as an indicator for overall economic condition.

That is why we decided to focus on this topic. Our background from an early age with investing in the stock market, along with our curiosity for looking deeper as seeing how shifts occur from a few economic indicators gave us motivation to pursue this topic.

In [ ]:
import warnings
warnings.filterwarnings('ignore')

#Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup as bs
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
import matplotlib.ticker as ticker
import seaborn as sns

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Data Collection and Data Processing¶

Wikipedia¶

In order to have all sector data (company general industry), clean tickers, and securities(company names) in a final data set, we scraped from a wikipedia table found here. We scraped this wikipedia page to obtain those 3 main points mentioned above since our dataset which included all the stock data was in text form without those key ways to identify how we would use the data later in the project.

In [ ]:
#wiki data of stocks
req = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
root = bs(req.content, "html")
wiki_df = pd.read_html(root.find('table').prettify())[0]

stocks_df = wiki_df[["Symbol", "Security", "GICS  Sector"]] #cleaning the original data pulled for only relevant columns
stocks_df.head(20)
Out[ ]:
Symbol Security GICS Sector
0 MMM 3M Industrials
1 AOS A. O. Smith Industrials
2 ABT Abbott Health Care
3 ABBV AbbVie Health Care
4 ACN Accenture Information Technology
5 ADBE Adobe Inc. Information Technology
6 AMD Advanced Micro Devices Information Technology
7 AES AES Corporation Utilities
8 AFL Aflac Financials
9 A Agilent Technologies Health Care
10 APD Air Products and Chemicals Materials
11 ABNB Airbnb Consumer Discretionary
12 AKAM Akamai Information Technology
13 ALB Albemarle Corporation Materials
14 ARE Alexandria Real Estate Equities Real Estate
15 ALGN Align Technology Health Care
16 ALLE Allegion Industrials
17 LNT Alliant Energy Utilities
18 ALL Allstate Financials
19 GOOGL Alphabet Inc. (Class A) Communication Services

Kaggle¶

Next is the most computationally expensive part of this tutorial. Each of the 7000+ stocks found in this data set store their data in a unique text file (all found at https://www.kaggle.com/datasets/borismarjanovic/price-volume-data-for-all-us-stocks-etfs) , which will make collecting this data very CPU heavy. Typically, we would not be able to efficiently create and test a tutorial with an initial data set this large and would be forced to truncate the data we work with. However, we had access to a more powerful machine than a basic PC and therefore could keep our scope on the entire S&P 500 rather than a shortened version. Our data included all the data for every day of the stock's life from 1952 to 2017. This meant that within a calendar year we were looking at well over 1000 values per year since each day recorded had 7 different entries which were date, daily open value, daily high value, daily low value, daily close value, daily volume traded, and openInt. This collection process is performed by a loop that runs through every ticker extracted from the wikipedia table and formats it to the expected text file name format in the data set (i.e ticker.us.txt). For our purposes we needed the dates, high, low, and volume traded per day. These initial raw data points let us clean our data to align it with our economic indicators and eventually merge the dataset with the wikipedia web scraping step. As a bit of preprocessing, after every available stock’s data is collected, the unnecessary columns are dropped and the highs and lows are averaged by fiscal quarter and placed into the data frame. To do so we looked at the yearly quarter changes and then took all the values in there as our quarter amounts per year. We did this for a few reasons.

1. One was the large amount of data would have been too much to continually process and would have made our finding skewed or inaccurate. 

2. Our economic indicators we chose (talked more about below) are tracked in quarterly periods so we changed the stock data to quarterly to limit the skew of daily data being compared to quarterly data. 
In [ ]:
#kaggle dataset collecting and processing
import csv
import os

stock_files = os.listdir("Stocks")
quarters = ["01-01", "04-01", "07-01", "10-01", "12-31"]
years = list(range(2013, 2018))
sp500_tickers = list(wiki_df["Symbol"])

kaggle_df = pd.DataFrame()


for ticker in sp500_tickers: #going through all the sp500 tickers
        
    filename = f"{ticker.lower()}.us.txt"
    
    if filename not in stock_files:
        continue
        
    #print(f"Processing {filename}") 
    
    with open("Stocks/" + filename, "r") as daily_data: #opening the file for each tickers data
        data_reader = csv.reader(daily_data)
        i = 0
        
        cols = next(data_reader)
        df = pd.DataFrame(columns=cols)
        
        target_year = 2013
        
        line = next(data_reader)
        year = int(line[0][0:4])
        
        if year > target_year:
            continue
  
        while year != target_year:
            line = next(data_reader)
            year = int(line[0][0:4])
            
        df.loc[-1] = line
        df.index += 1
        
        for line in data_reader:
            df.loc[-1] = line
            df.index += 1
            
        for c in cols[1:-1]:
            df[c] = df[c].astype(float)
            
        df.reset_index(inplace=True)
        df.drop(["OpenInt", "Open", "Close", "index"], axis=1, inplace=True)

            
    row = {"Symbol":ticker}
    
    # Averaging quarterly data
    for year in years:
        for i in range(len(quarters) - 1):
            begin = f"{year}-{quarters[i]}"
            end = f"{year}-{quarters[i + 1]}"
            
            quarter = f"{year} Q{i + 1}"
            quarterly_df = df[(df["Date"] >= begin) & (df["Date"] < end)]
            quarterly_df = quarterly_df[["High","Low","Volume"]].mean()
            quarterly_df.rename({"High": f"{quarter} Mean High", "Low": f"{quarter} Mean Low", "Volume": f"{quarter} Mean Volume"}, inplace=True)
            
            for k, v in dict(quarterly_df).items():
                row[k] = v
                
    row = pd.DataFrame(pd.Series(row)).T
    kaggle_df = pd.concat([kaggle_df, row])
    
kaggle_df.head(30)
Out[ ]:
Symbol 2013 Q1 Mean High 2013 Q1 Mean Low 2013 Q1 Mean Volume 2013 Q2 Mean High 2013 Q2 Mean Low 2013 Q2 Mean Volume 2013 Q3 Mean High 2013 Q3 Mean Low 2013 Q3 Mean Volume 2013 Q4 Mean High 2013 Q4 Mean Low 2013 Q4 Mean Volume 2014 Q1 Mean High 2014 Q1 Mean Low 2014 Q1 Mean Volume 2014 Q2 Mean High 2014 Q2 Mean Low 2014 Q2 Mean Volume 2014 Q3 Mean High 2014 Q3 Mean Low 2014 Q3 Mean Volume 2014 Q4 Mean High 2014 Q4 Mean Low 2014 Q4 Mean Volume 2015 Q1 Mean High 2015 Q1 Mean Low 2015 Q1 Mean Volume 2015 Q2 Mean High 2015 Q2 Mean Low 2015 Q2 Mean Volume 2015 Q3 Mean High 2015 Q3 Mean Low 2015 Q3 Mean Volume 2015 Q4 Mean High 2015 Q4 Mean Low 2015 Q4 Mean Volume 2016 Q1 Mean High 2016 Q1 Mean Low 2016 Q1 Mean Volume 2016 Q2 Mean High 2016 Q2 Mean Low 2016 Q2 Mean Volume 2016 Q3 Mean High 2016 Q3 Mean Low 2016 Q3 Mean Volume 2016 Q4 Mean High 2016 Q4 Mean Low 2016 Q4 Mean Volume 2017 Q1 Mean High 2017 Q1 Mean Low 2017 Q1 Mean Volume 2017 Q2 Mean High 2017 Q2 Mean Low 2017 Q2 Mean Volume 2017 Q3 Mean High 2017 Q3 Mean Low 2017 Q3 Mean Volume 2017 Q4 Mean High 2017 Q4 Mean Low 2017 Q4 Mean Volume
0 MMM 91.312167 90.39165 3000908.633333 98.020156 96.832453 3184629.796875 105.455313 104.375 2593571.890625 115.978413 114.62381 3003412.984127 122.193607 120.529344 3286431.163934 129.457937 128.161111 2362018.47619 133.417656 132.191094 2302554.203125 143.35873 141.33619 3041470.253968 155.016885 152.888197 2458525.803279 151.794762 150.024921 2440807.730159 140.275156 138.233281 2956910.703125 147.416667 145.417778 2596462.174603 148.371311 146.186393 2821154.639344 163.714844 161.959375 1975933.640625 174.425313 172.915781 1730216.75 169.834127 167.954603 2021322.095238 181.455806 179.752903 1989174.451613 199.043492 197.367778 1849468.174603 207.525206 205.418159 1684750.68254 225.0516 222.543667 1925851.566667
0 AOS 16.693183 16.394633 1580067.933333 17.946859 17.548625 1883052.6875 20.077312 19.714828 1230033.8125 24.618 24.195079 1645559.492063 23.568213 23.127361 1678852.622951 23.258302 22.849127 1349398.031746 23.570563 23.206109 1015275.21875 25.370365 24.94981 999006.349206 29.686967 29.156984 1118223.540984 33.707444 33.210063 1020983.460317 33.785922 32.879422 1674719.625 37.247635 36.402302 1528875.650794 34.733787 33.713066 1840718.47541 40.014906 39.273094 1472523.25 46.386844 45.709766 1067305.671875 48.150698 47.271032 1280703.47619 49.546 48.752032 919397.596774 54.049048 53.272238 869093.190476 56.258095 55.529968 1342885.873016 60.550513 59.777517 862284.733333
0 ABT 30.724817 30.309883 12451680.116667 33.482594 32.897703 7951001.546875 32.113391 31.673766 8005789.8125 33.834984 33.331286 7459989.047619 35.685213 35.102541 9174882.016393 36.46273 35.985317 6591339.587302 39.491984 39.013391 5035284.0 40.963206 40.289476 5394024.698413 43.385066 42.705344 5984359.327869 45.528857 44.942651 4981663.079365 45.127313 44.260953 7043142.28125 42.363111 41.620952 6665399.507937 38.360328 37.510508 8101788.295082 38.6665 38.062969 10643153.84375 41.879109 41.256594 9773374.203125 39.127889 38.410063 8882569.047619 42.683161 42.123548 8615529.145161 45.023984 44.466397 6880527.587302 50.187952 49.569937 5433541.793651 55.202797 54.498413 5491270.8
0 ABBV 31.711379 30.96831 10897951.293103 37.874344 36.874609 7384147.078125 38.554094 37.840172 4690873.125 43.141349 42.268556 6319228.111111 44.859967 43.921311 7609473.032787 46.878524 45.901365 6642542.222222 49.877344 48.994797 13670690.390625 57.269476 55.990286 13088959.507937 55.40041 54.023197 13034415.47541 60.105444 59.026444 12052673.666667 60.615578 59.162406 10942893.65625 54.219746 52.613365 9892407.507937 53.032361 51.245393 9461727.672131 58.174437 56.981422 9176243.015625 62.272141 61.371672 7220892.84375 59.454603 58.236333 7871293.349206 61.587597 60.726435 7046768.645161 66.49673 65.606238 5782878.920635 76.087587 74.74454 6246636.222222 93.05493 90.88035 6171383.966667
0 ACN 65.413133 64.46965 2958136.65 71.669828 70.472016 3545633.890625 66.856953 65.881375 3753502.28125 68.918381 67.920349 3565505.31746 75.105098 73.927754 3385196.557377 74.092238 73.077921 3025692.761905 74.035734 73.190891 2751022.140625 77.558571 76.471746 3074781.619048 83.717639 82.393557 2900945.442623 90.404254 89.257111 2328701.730159 94.127406 92.481 2749655.25 100.857413 99.433635 2597977.174603 98.467754 96.567508 3563175.213115 111.829062 110.387188 2252809.296875 110.0425 108.5325 2355634.875 116.823968 115.13 2508397.920635 117.692258 116.198548 2577157.33871 121.100476 119.760794 2231790.698413 129.643016 128.244762 1905023.0 140.0467 138.514 1750601.6
0 ADBE 39.7829 39.11015 3569916.466667 44.708719 43.920484 3709883.078125 48.072375 47.381406 3170725.46875 55.64381 54.627508 3985544.47619 64.792295 63.390164 3924305.557377 65.134952 63.753222 3614820.587302 71.433422 70.293672 2636903.1875 70.770476 69.400349 3332754.111111 75.196279 73.791721 2366642.131148 78.606508 77.328 2462360.793651 81.957266 80.139375 3419304.015625 90.60054 88.735 2952103.492063 87.827552 85.323174 3681589.95082 96.7314 95.154775 2111794.109375 100.877748 99.587905 1807114.6875 106.782841 105.022873 2293033.888889 117.874855 116.368194 2306142.419355 137.783524 135.854397 2570180.857143 150.173397 148.012063 1972602.984127 166.874633 163.9689 3072533.633333
0 AMD 2.672167 2.564667 18729917.45 3.521875 3.371563 28015006.546875 3.874219 3.744844 29267395.953125 3.633492 3.531429 24142526.84127 3.882131 3.758033 25269923.836066 4.116508 3.99381 21170687.333333 4.112656 3.992812 29876772.28125 2.809206 2.703492 18739877.619048 2.868279 2.74918 17290156.918033 2.463016 2.385556 14835027.238095 1.936641 1.833281 13248920.109375 2.268175 2.167619 9464912.730159 2.35041 2.198033 14099698.442623 3.985508 3.764922 22684549.421875 6.523773 6.237422 35252396.84375 8.582698 8.168543 46450313.365079 12.959565 12.396373 64543556.741935 12.506833 11.939441 76875665.444444 13.323238 12.81497 64954818.825397 12.990133 12.568997 61987566.333333
0 AES 10.10842 9.914168 7043960.75 11.204391 10.953047 6695538.890625 11.321672 11.107187 4528306.390625 12.618524 12.345254 6451758.952381 12.553852 12.295115 6094453.245902 12.804984 12.569683 6143209.666667 13.290375 13.076266 4156436.75 12.298159 11.993286 6739359.666667 11.363377 11.076623 7149936.786885 12.214016 11.966571 6735586.222222 11.136202 10.871042 6675684.125 9.324605 9.074429 6452733.015873 9.368844 9.050907 6921824.786885 10.665641 10.432906 5180647.8125 11.885609 11.656453 4655622.859375 11.352571 11.085476 5835446.761905 11.157871 10.906742 5824308.983871 11.298889 11.09727 5616984.539683 11.091317 10.909143 4938647.380952 10.9345 10.755767 5099342.833333
0 AFL 46.2141 45.433267 4104190.6 49.176641 48.289062 3435438.578125 54.800359 54.012328 2189281.828125 59.970825 59.241762 2141384.761905 58.619787 57.822098 2175752.688525 57.771492 57.017746 1926278.460317 56.681 56.033297 2120122.046875 55.331175 54.628984 2444321.063492 57.613 56.80082 2244606.655738 60.131063 59.417365 2073244.571429 57.887156 56.950672 2387365.484375 59.969111 59.055286 2358668.0 57.645623 56.610869 2563169.836066 66.453531 65.644219 2310928.90625 71.264313 70.539875 1739223.609375 69.491175 68.60873 2020415.47619 70.262032 69.446823 2214819.548387 74.90873 74.134111 1846555.492063 80.695032 79.83694 1541004.079365 84.0718 83.197033 1440761.533333
0 A 29.667933 29.165533 5100460.5 30.035625 29.418578 5595136.578125 32.569641 32.047453 3832068.15625 36.757286 36.169667 3353795.761905 40.050869 39.277967 4170020.704918 39.052714 38.360952 2721890.15873 39.777484 39.240562 2585435.890625 39.484873 38.748032 3063606.825397 39.669721 38.975672 2532460.409836 40.752206 40.151476 2448139.761905 37.349875 36.627813 2521089.796875 38.163619 37.420571 3004962.285714 37.845246 36.91459 2357458.967213 42.781375 42.130453 2219362.46875 46.398047 45.748719 1877364.125 45.384619 44.604825 1990839.174603 50.474952 49.724629 2029887.822581 57.100619 56.334349 2009349.047619 62.56181 61.779984 1643051.47619 67.413667 66.701367 1397107.4
0 APD 77.760883 76.798833 1093138.95 81.289891 80.249953 1552260.046875 92.838203 91.196234 2046325.296875 98.905492 97.454413 1143625.253968 104.973115 103.321607 1455185.934426 110.916032 109.299365 1199132.857143 122.2025 120.556875 1628685.890625 127.274921 125.009206 1710348.936508 139.970164 137.682459 1108369.032787 138.894444 136.88254 1078376.428571 130.466094 127.980312 1500850.15625 129.770635 127.481429 1433344.746032 125.588852 122.712131 1905698.672131 138.716563 136.732812 1064761.28125 145.268594 143.489688 1379288.796875 137.335556 135.394921 1452917.984127 139.034194 137.274355 1545373.048387 141.140476 139.533016 1303113.492063 146.273016 144.856825 1410870.507937 156.523133 155.0267 926571.433333
0 AKAM 38.345833 37.470117 3942939.716667 42.344391 41.340719 2741468.78125 47.475531 46.609094 1724353.609375 47.71781 46.69873 2251961.84127 55.955328 54.503295 2719331.836066 56.288587 54.859762 2191404.68254 60.705016 59.619875 1624116.5 60.925302 59.595159 1622168.206349 66.694918 65.303443 1631151.639344 74.717937 73.624524 1419948.698413 72.898438 71.173375 1800375.140625 62.297746 60.930667 2270051.634921 51.614267 49.998952 2834324.114754 53.284647 51.860842 1531503.703125 54.033789 53.00923 1895174.203125 64.308589 62.977471 2097015.238095 66.29605 65.184232 1916476.16129 53.296676 52.291944 2627907.0 48.271071 47.413467 1919155.206349 52.275247 51.428977 2122564.7
0 ALB 59.422 58.35975 738925.166667 58.865891 57.647281 798707.59375 59.128953 58.218312 804109.703125 62.520635 61.596651 662333.936508 61.266967 60.274475 631124.557377 65.074079 64.164619 533168.222222 62.160953 60.983437 1437183.359375 56.594016 55.267 1462997.079365 52.807508 51.484361 2007614.065574 57.841937 56.750413 1095955.634921 47.790344 46.420312 1471876.21875 50.553 49.121714 1480702.650794 54.459115 52.543164 1433544.245902 72.837594 71.230234 1899285.328125 81.901984 80.286344 1283941.09375 84.576016 82.875873 888396.555556 97.426516 95.477742 1036087.532258 109.16 107.409206 1023728.301587 119.617524 117.308175 1430350.84127 140.3418 137.821167 1259730.2
0 ARE 61.36855 60.743033 510877.366667 61.040516 59.928797 812680.375 57.557531 56.515922 497802.3125 56.650143 55.566413 487156.761905 62.462639 61.485738 523381.442623 67.283714 66.416952 380025.349206 70.467438 69.669313 459933.390625 76.38346 75.279413 490275.349206 89.050443 87.467197 471547.639344 86.529873 85.126968 399307.920635 84.633359 83.013094 364752.734375 85.659873 84.143825 468813.809524 77.99 76.298508 515845.04918 91.893234 90.337047 623097.203125 105.790109 103.950516 587047.640625 105.292381 103.161841 609395.222222 111.310645 109.527903 1028053.66129 115.439048 113.911905 727220.412698 120.099048 118.52127 589167.52381 123.519667 122.114167 492825.366667
0 ALGN 31.159017 30.377733 965606.483333 34.975797 33.909 932521.21875 44.089359 43.058906 623338.703125 54.480365 53.042889 1102201.47619 56.988443 55.11541 1269293.868852 52.782825 51.130048 1145288.68254 55.01075 53.947656 664095.84375 53.482111 52.269952 715800.730159 57.678639 56.164098 932696.57377 59.708095 58.407381 876989.904762 60.373438 58.926797 730518.59375 64.794143 63.197921 649318.666667 66.427911 64.275434 654232.180328 77.125 75.716822 578877.9375 91.181948 89.562223 596522.5 94.061211 91.813589 876983.809524 101.910419 99.875898 787098.354839 137.06419 134.170984 1255167.301587 172.816968 169.111603 868528.095238 213.796 208.346733 1115914.7
0 ALLE NaN NaN NaN NaN NaN NaN NaN NaN NaN 42.559722 41.557444 2326585.333333 49.590262 48.319852 1270741.557377 51.807556 50.721571 705595.587302 51.752297 50.791828 587033.5625 51.206841 50.206476 723700.015873 56.497984 55.437393 762305.081967 60.683825 59.811143 564986.84127 60.533687 59.331172 562724.65625 63.634683 62.399492 778905.380952 61.081 59.526738 912952.032787 66.168641 65.057437 652007.0 70.223922 69.284297 588783.5 66.34781 65.213746 800261.730159 70.690645 69.571823 775755.725806 78.576492 77.469111 747584.571429 81.224873 80.155159 517902.968254 86.157333 84.726927 796675.4
0 LNT 19.833667 19.630533 1096029.6 21.7005 21.357781 1031537.703125 22.163375 21.863391 1133217.8125 22.626413 22.287175 1118667.571429 23.384557 23.072443 1122770.180328 25.851143 25.529857 1148605.238095 25.994484 25.654 1289851.59375 28.091159 27.61719 1222716.936508 29.901721 29.356951 1519714.639344 28.033143 27.640937 1334767.587302 27.474594 27.022156 1400986.171875 28.269857 27.814222 1324544.444444 32.027246 31.455148 1505386.147541 35.556656 34.981984 2127155.546875 37.931031 37.304672 1506166.765625 35.969651 35.375222 1294674.31746 37.607484 37.134 1213372.370968 39.966476 39.541794 1381354.412698 41.404048 40.971587 1011096.301587 43.097533 42.592293 1377366.533333
0 ALL 41.775167 41.2432 3434803.983333 45.130828 44.478172 3923570.75 46.851547 46.229 2952061.890625 49.845937 49.225032 2502155.619048 50.444 49.757066 3070192.934426 54.400365 53.872794 2482041.920635 56.904359 56.361063 2327236.03125 62.651508 61.936 2418033.063492 67.630475 66.660803 2569832.672131 65.974413 65.215794 2616602.15873 60.50425 59.549359 3279574.0625 60.445508 59.56127 2817078.380952 61.341361 60.395262 2948406.852459 65.640938 64.841594 1997076.203125 67.744703 67.177312 1629849.046875 69.673952 68.879175 2349906.809524 78.158903 77.394903 1819662.048387 84.550587 83.749984 1847252.634921 91.21501 90.157444 1960922.936508 94.74633 93.664003 1809744.966667
0 GOOGL 391.94 386.54 4710365.566667 427.770781 420.828594 4744065.375 446.389219 440.9575 3683857.9375 510.070476 503.04619 3619180.253968 589.497869 580.325738 4558310.721311 557.104603 545.903333 2446307.904762 593.085625 585.090781 1612151.828125 551.79746 542.204127 2198303.619048 546.473607 536.552951 2368245.360656 553.551905 545.467778 1767785.650794 651.789531 636.782187 2616696.96875 745.732619 731.616508 2054101.825397 745.071836 727.587508 2133765.016393 739.610859 728.390531 1371835.65625 785.743031 777.097047 1061037.15625 806.646873 793.98481 1732969.52381 844.654742 835.665145 1500393.129032 941.14981 928.679 1699425.68254 954.277016 941.745508 1649185.634921 1018.2223 1004.470233 1591438.266667
0 MO 27.804733 27.513033 13021175.25 30.020234 29.596156 11530296.125 29.783438 29.420891 10201472.640625 31.517444 31.145762 8860092.920635 31.564984 31.140131 10249242.163934 35.299429 34.885984 8557514.396825 38.049359 37.613687 7041493.1875 43.771159 43.163921 7621816.238095 48.308754 47.592033 8041661.639344 46.593317 45.944587 7200026.952381 49.801453 48.957937 7766711.328125 54.317063 53.491048 6346995.714286 57.344525 56.343689 7185249.213115 61.154234 60.290312 6438985.9375 64.032469 63.298719 5463859.625 62.576143 61.735317 6994040.809524 70.895935 70.206661 6767171.483871 72.274222 71.559794 5800049.904762 66.655587 65.611365 8136257.253968 64.755757 63.85896 6578395.933333
0 AMZN 269.118 262.854 3522848.05 269.194062 263.381406 3039878.9375 300.553125 294.989531 2420409.5625 362.279683 354.804127 2940043.174603 375.857541 366.867541 3931548.016393 320.385238 311.777143 4657967.349206 335.560938 329.103437 3587100.65625 315.103968 307.935556 4204031.571429 355.356557 347.714426 3923801.163934 421.807937 414.614286 3037333.206349 512.614375 500.030937 4089669.734375 635.442857 621.514841 4296253.888889 576.212148 557.810295 4989470.04918 681.847781 670.028109 3135022.640625 769.382875 760.123859 2378967.3125 791.033063 777.629952 3940060.904762 836.813048 827.159419 3134660.548387 961.248302 946.97246 3873340.206349 989.650429 975.274317 3407886.507937 1038.752167 1022.620467 3624009.266667
0 AEE 27.581883 27.2762 2239801.833333 29.653313 29.193766 2050389.765625 29.771781 29.333625 1813607.015625 31.188175 30.726413 1682456.365079 33.811361 33.330541 2031949.606557 35.396603 34.857952 2168062.507937 35.06275 34.561078 1501801.734375 38.459984 37.736508 1714479.365079 39.92759 39.136557 2078598.606557 37.14381 36.593 1494496.555556 37.614984 36.93275 1793303.921875 40.935873 40.251238 1859961.571429 43.962672 43.098574 2283694.459016 47.250312 46.486203 2023767.28125 49.458922 48.746828 1539681.015625 48.492365 47.693873 1554330.47619 52.6895 52.027306 1447328.193548 55.068048 54.443286 1583701.587302 57.753825 57.100922 1367012.222222 61.177167 60.499507 1271617.766667
0 AAL NaN NaN NaN NaN NaN NaN NaN NaN NaN 25.450929 24.744214 13773163.642857 33.337885 32.335541 11510576.688525 38.284619 37.01619 12882764.603175 38.769172 37.566609 10396599.125 42.478651 40.844746 16353052.047619 50.163951 48.608475 15114543.704918 45.286794 43.947 12909723.285714 40.863875 39.51525 11103670.140625 42.985556 41.79873 8641494.095238 40.070148 38.767656 9220325.590164 33.859359 32.891188 9850499.3125 35.320672 34.3435 8649792.484375 43.729714 42.570048 6898510.587302 45.53771 44.359613 6820026.177419 46.744032 45.639381 6349688.460317 48.805619 47.672143 5820506.15873 50.257267 49.0754 4891022.533333
0 AEP 38.16775 37.795517 3277158.716667 40.749 40.151016 3817119.09375 38.155797 37.600766 3865680.796875 39.953571 39.392444 4153897.253968 42.539951 41.96441 3284502.04918 46.640651 45.978524 3352281.825397 47.244516 46.573125 2618190.4375 51.561397 50.680635 3023673.285714 54.140082 53.113279 3176102.278689 50.828159 50.138587 2712356.873016 51.449766 50.550516 3062072.65625 53.019968 52.143 3349036.365079 58.439508 57.327066 3541953.245902 62.276391 61.355812 2758869.453125 64.582781 63.727406 2528305.421875 59.796524 58.856286 2880379.142857 63.224661 62.524597 2422669.870968 68.402206 67.732016 2461902.587302 70.743556 70.028413 2350829.126984 73.047233 72.237 2089546.366667
0 AXP 58.424617 57.642217 5916452.983333 67.144656 66.072609 6094520.890625 71.165656 70.142172 4506820.21875 77.688333 76.752143 3872488.825397 85.156115 83.758525 4386631.245902 85.84054 84.663698 3528490.063492 86.043828 85.038156 3900152.140625 85.935381 84.675778 4373285.539683 80.077197 78.707738 8398035.081967 76.511556 75.58981 6278623.126984 74.554031 73.371109 6229667.703125 70.75154 69.766095 5752311.52381 56.939426 55.732574 8905854.016393 62.205594 61.188281 5104626.359375 63.319953 62.471031 4194254.015625 68.611254 67.594175 4694169.079365 77.76121 76.76821 4369751.467742 78.962667 78.062667 3541468.888889 86.027794 85.152111 3221804.190476 93.794167 92.721347 3273773.8
0 AIG 35.63535 34.937367 18662677.1 40.712 39.644031 18893028.9375 44.973734 44.233453 10869456.828125 47.204571 46.434095 10460289.984127 47.300574 46.446148 10389889.934426 50.300889 49.538619 8359558.063492 52.024625 51.378281 7492150.28125 51.222714 50.449048 9040392.380952 51.376623 50.522656 9553777.983607 56.659952 55.859921 8016945.84127 58.858641 57.882594 8173436.71875 58.965571 58.01373 9109713.015873 52.664508 51.560426 10310924.180328 53.940625 53.150031 7151242.59375 56.055703 55.406781 5828523.03125 61.984254 61.160714 6180682.111111 63.744532 62.847677 6946704.741935 62.036651 61.279222 6627421.428571 62.432125 61.689286 4870472.984127 63.60202 62.785417 5001191.566667
0 AMT 71.171967 70.203417 2681577.35 74.151641 72.700094 2828155.21875 67.497469 66.230937 4176361.875 72.705429 71.528095 2493615.31746 76.560115 75.372115 2451125.47541 81.45619 80.470778 1942582.888889 90.019438 88.932203 1950203.6875 93.363698 92.054143 2159107.634921 92.929443 91.358689 3063168.229508 90.779127 89.418984 2496382.952381 90.769953 89.317141 2429212.671875 94.292524 92.840778 2010147.253968 91.148311 89.217967 2612946.459016 103.817969 102.446625 1833266.359375 112.187187 110.758594 1892188.5625 107.569206 105.636381 2785039.603175 109.017419 107.594516 2162754.129032 127.789841 126.001587 2019509.253968 139.51873 137.524381 1779885.222222 142.252067 139.603367 2029483.5
0 AWK 35.237433 34.778283 1025179.866667 37.365094 36.784578 992241.484375 37.703328 37.110172 913628.921875 38.442111 37.875413 774682.396825 39.933492 39.397492 809006.459016 43.639984 43.091905 810337.079365 45.719391 45.128344 622071.796875 48.767746 47.991571 787525.825397 51.57123 50.692295 945761.213115 50.208984 49.49719 928097.31746 50.117781 49.294469 916592.296875 55.614905 54.709651 921576.619048 63.22118 62.047738 2237006.491803 72.439109 71.299516 1342465.15625 76.716813 75.599984 1249975.6875 71.402063 70.18054 1046900.952381 73.440645 72.457113 942151.951613 78.155667 77.214238 889762.333333 80.783873 79.917619 749040.920635 86.3166 85.294267 863855.366667
0 AMP 61.237867 60.3713 1646950.916667 70.256109 68.886281 1516202.953125 80.160359 78.970172 1108717.375 93.929937 92.585571 1151260.777778 100.700115 98.922295 1392152.0 103.056683 101.413302 1117441.873016 113.578438 112.099375 862742.515625 117.761111 115.726365 983218.507937 123.361639 121.201967 1069675.52459 119.874762 118.00873 1595580.619048 111.807031 109.493828 1358759.46875 106.46346 104.32081 1108963.15873 86.277902 83.776607 1847693.655738 93.391797 91.510938 1325956.453125 94.218453 92.751344 900529.609375 103.961698 101.794968 1222438.666667 122.39371 120.047097 1158429.677419 126.434286 124.307619 1094924.507937 139.690794 137.466032 1047268.111111 155.4065 153.385 777338.433333
0 AME 40.1458 39.5763 890057.9 40.912266 40.243781 1132761.3125 44.16825 43.51375 1679207.875 47.487794 46.641397 1077237.047619 50.919082 50.149918 893441.327869 51.665873 50.94927 819664.460317 51.114672 50.49175 920995.015625 50.130683 49.184952 1456005.825397 50.591525 49.795426 1205173.131148 53.021397 52.373635 1035037.888889 53.579391 52.651234 1434127.71875 54.565159 53.666889 1389681.47619 47.504541 46.49777 1716285.622951 47.942813 47.254188 1538806.34375 47.539391 46.913469 1536147.125 47.530635 46.773587 2382602.873016 52.837903 52.102081 1744188.596774 59.134095 58.403714 1455446.365079 63.455905 62.755048 903732.825397 68.162663 67.344667 1160574.3

Unemployment and GDP Excel¶

Unemployment and GDP Excel Finally, quarterly GDP and unemployment are scraped from an excel file found at https://www.cbo.gov/system/files/2019-04/51137-2017-06-potentialgdp-2.xlsx.We took this data set because it focused on the 2 economic indicators which we wanted to focus on. We chose GDP and unemployment because they are indicators which will affect a majority of the GICS Sectors (industries) that our data was grouped into, however, their effects on it are not as obvious as something like interest rates. This data set showed us the Real GDP, Nominal GDP, Unemployment Underlying Long Term, and Unemployment Natural (defined below):

1. Real GDP: Measures the actual growth of production without any distorting effects from inflation. 

2. Nominal GDP: Total value of all goods and services produced in a give  time period, usually quarterly or annually. 

3. Unemployment Underlying Long Term: Refers to the number of previously employed people in the U.S. who have been facing unemployment for at least 6 months. 

4. Unemployment Natural: Refers to the number of previously employed people in the U.S. who are currently facing unemployment.

Link to Nominal vs. Real GDP: https://www.investopedia.com/ask/answers/030515/when-do-economists-use-real-gdp-instead-gdp.asp#:~:text=Key%20Takeaways,any%20distorting%20effects%20from%20inflation.

In [ ]:
#gdp and unemployment excel file
#starts extracting data at 2013Q1(252 rows down)
gdp_unemp_df = pd.read_excel("gdp_unemployment_data.xlsx", sheet_name='3. GDP and Unemployment', skiprows=252)

#removes empty columns
gdp_unemp_df = gdp_unemp_df.iloc[:, [0,1,2,4,5]]

#renames columns
gdp_unemp_df.columns = ['Quarter', 'Real GDP', 'Nominal GDP', 'Unemployment Underlying Long Term', 'Unemployment Natural']

#drops all rows that have NaN as Nominal GDP
gdp_unemp_df = gdp_unemp_df.dropna(subset=['Nominal GDP'])

# Filter to include only rows up to 2017Q4
gdp_unemp_df = gdp_unemp_df[(gdp_unemp_df['Quarter'] >= '2013Q1') & (gdp_unemp_df['Quarter'] <= '2017Q4')]
gdp_unemp_df.set_index("Quarter", inplace=True)
# print(gdp_unemp_df.to_string(index=False))
gdp_unemp_df.head(20)
Out[ ]:
Real GDP Nominal GDP Unemployment Underlying Long Term Unemployment Natural
Quarter
2013Q1 15957.4 16965.6 5.048 5.540
2013Q2 16017.8 17069.4 5.025 5.395
2013Q3 16079.7 17223.3 5.002 5.292
2013Q4 16142.8 17381.6 4.976 5.176
2014Q1 16206.5 17525.7 4.953 5.073
2014Q2 16272.1 17690.1 4.930 4.990
2014Q3 16338.6 17838.2 4.915 4.935
2014Q4 16405.7 17934.9 4.879 4.879
2015Q1 16473.1 18006.3 4.830 4.830
2015Q2 16540.2 18181.3 4.785 4.785
2015Q3 16607.0 18312.2 4.765 4.765
2015Q4 16673.3 18423.7 4.755 4.755
2016Q1 16738.0 18518.1 4.749 4.749
2016Q2 16801.3 18694.4 4.745 4.745
2016Q3 16864.3 18831.0 4.743 4.743
2016Q4 16927.7 18999.3 4.742 4.742
2017Q1 16992.2 19178.8 4.740 4.740
2017Q2 17058.2 19280.7 4.738 4.738
2017Q3 17125.5 19468.0 4.737 4.737
2017Q4 17194.2 19647.5 4.735 4.735

Data Processing (Cont'd)

Next, in order to eventually visualize and analyze the data we must merge the wikipedia information (ticker, security, and sector) with the newly formed kaggle stock data (quarterly high and low values between 2013-2017).

In [ ]:
#merging the stock dataframe from wiki and the kaggle stock data frame. 
#Merge on symbol so we can work with 1 dataframe
stocks_df = pd.merge(stocks_df, kaggle_df, on="Symbol")
stocks_df.head(20)
Out[ ]:
Symbol Security GICS Sector 2013 Q1 Mean High 2013 Q1 Mean Low 2013 Q1 Mean Volume 2013 Q2 Mean High 2013 Q2 Mean Low 2013 Q2 Mean Volume 2013 Q3 Mean High 2013 Q3 Mean Low 2013 Q3 Mean Volume 2013 Q4 Mean High 2013 Q4 Mean Low 2013 Q4 Mean Volume 2014 Q1 Mean High 2014 Q1 Mean Low 2014 Q1 Mean Volume 2014 Q2 Mean High 2014 Q2 Mean Low 2014 Q2 Mean Volume 2014 Q3 Mean High 2014 Q3 Mean Low 2014 Q3 Mean Volume 2014 Q4 Mean High 2014 Q4 Mean Low 2014 Q4 Mean Volume 2015 Q1 Mean High 2015 Q1 Mean Low 2015 Q1 Mean Volume 2015 Q2 Mean High 2015 Q2 Mean Low 2015 Q2 Mean Volume 2015 Q3 Mean High 2015 Q3 Mean Low 2015 Q3 Mean Volume 2015 Q4 Mean High 2015 Q4 Mean Low 2015 Q4 Mean Volume 2016 Q1 Mean High 2016 Q1 Mean Low 2016 Q1 Mean Volume 2016 Q2 Mean High 2016 Q2 Mean Low 2016 Q2 Mean Volume 2016 Q3 Mean High 2016 Q3 Mean Low 2016 Q3 Mean Volume 2016 Q4 Mean High 2016 Q4 Mean Low 2016 Q4 Mean Volume 2017 Q1 Mean High 2017 Q1 Mean Low 2017 Q1 Mean Volume 2017 Q2 Mean High 2017 Q2 Mean Low 2017 Q2 Mean Volume 2017 Q3 Mean High 2017 Q3 Mean Low 2017 Q3 Mean Volume 2017 Q4 Mean High 2017 Q4 Mean Low 2017 Q4 Mean Volume
0 MMM 3M Industrials 91.312167 90.39165 3000908.633333 98.020156 96.832453 3184629.796875 105.455313 104.375 2593571.890625 115.978413 114.62381 3003412.984127 122.193607 120.529344 3286431.163934 129.457937 128.161111 2362018.47619 133.417656 132.191094 2302554.203125 143.35873 141.33619 3041470.253968 155.016885 152.888197 2458525.803279 151.794762 150.024921 2440807.730159 140.275156 138.233281 2956910.703125 147.416667 145.417778 2596462.174603 148.371311 146.186393 2821154.639344 163.714844 161.959375 1975933.640625 174.425313 172.915781 1730216.75 169.834127 167.954603 2021322.095238 181.455806 179.752903 1989174.451613 199.043492 197.367778 1849468.174603 207.525206 205.418159 1684750.68254 225.0516 222.543667 1925851.566667
1 AOS A. O. Smith Industrials 16.693183 16.394633 1580067.933333 17.946859 17.548625 1883052.6875 20.077312 19.714828 1230033.8125 24.618 24.195079 1645559.492063 23.568213 23.127361 1678852.622951 23.258302 22.849127 1349398.031746 23.570563 23.206109 1015275.21875 25.370365 24.94981 999006.349206 29.686967 29.156984 1118223.540984 33.707444 33.210063 1020983.460317 33.785922 32.879422 1674719.625 37.247635 36.402302 1528875.650794 34.733787 33.713066 1840718.47541 40.014906 39.273094 1472523.25 46.386844 45.709766 1067305.671875 48.150698 47.271032 1280703.47619 49.546 48.752032 919397.596774 54.049048 53.272238 869093.190476 56.258095 55.529968 1342885.873016 60.550513 59.777517 862284.733333
2 ABT Abbott Health Care 30.724817 30.309883 12451680.116667 33.482594 32.897703 7951001.546875 32.113391 31.673766 8005789.8125 33.834984 33.331286 7459989.047619 35.685213 35.102541 9174882.016393 36.46273 35.985317 6591339.587302 39.491984 39.013391 5035284.0 40.963206 40.289476 5394024.698413 43.385066 42.705344 5984359.327869 45.528857 44.942651 4981663.079365 45.127313 44.260953 7043142.28125 42.363111 41.620952 6665399.507937 38.360328 37.510508 8101788.295082 38.6665 38.062969 10643153.84375 41.879109 41.256594 9773374.203125 39.127889 38.410063 8882569.047619 42.683161 42.123548 8615529.145161 45.023984 44.466397 6880527.587302 50.187952 49.569937 5433541.793651 55.202797 54.498413 5491270.8
3 ABBV AbbVie Health Care 31.711379 30.96831 10897951.293103 37.874344 36.874609 7384147.078125 38.554094 37.840172 4690873.125 43.141349 42.268556 6319228.111111 44.859967 43.921311 7609473.032787 46.878524 45.901365 6642542.222222 49.877344 48.994797 13670690.390625 57.269476 55.990286 13088959.507937 55.40041 54.023197 13034415.47541 60.105444 59.026444 12052673.666667 60.615578 59.162406 10942893.65625 54.219746 52.613365 9892407.507937 53.032361 51.245393 9461727.672131 58.174437 56.981422 9176243.015625 62.272141 61.371672 7220892.84375 59.454603 58.236333 7871293.349206 61.587597 60.726435 7046768.645161 66.49673 65.606238 5782878.920635 76.087587 74.74454 6246636.222222 93.05493 90.88035 6171383.966667
4 ACN Accenture Information Technology 65.413133 64.46965 2958136.65 71.669828 70.472016 3545633.890625 66.856953 65.881375 3753502.28125 68.918381 67.920349 3565505.31746 75.105098 73.927754 3385196.557377 74.092238 73.077921 3025692.761905 74.035734 73.190891 2751022.140625 77.558571 76.471746 3074781.619048 83.717639 82.393557 2900945.442623 90.404254 89.257111 2328701.730159 94.127406 92.481 2749655.25 100.857413 99.433635 2597977.174603 98.467754 96.567508 3563175.213115 111.829062 110.387188 2252809.296875 110.0425 108.5325 2355634.875 116.823968 115.13 2508397.920635 117.692258 116.198548 2577157.33871 121.100476 119.760794 2231790.698413 129.643016 128.244762 1905023.0 140.0467 138.514 1750601.6
5 ADBE Adobe Inc. Information Technology 39.7829 39.11015 3569916.466667 44.708719 43.920484 3709883.078125 48.072375 47.381406 3170725.46875 55.64381 54.627508 3985544.47619 64.792295 63.390164 3924305.557377 65.134952 63.753222 3614820.587302 71.433422 70.293672 2636903.1875 70.770476 69.400349 3332754.111111 75.196279 73.791721 2366642.131148 78.606508 77.328 2462360.793651 81.957266 80.139375 3419304.015625 90.60054 88.735 2952103.492063 87.827552 85.323174 3681589.95082 96.7314 95.154775 2111794.109375 100.877748 99.587905 1807114.6875 106.782841 105.022873 2293033.888889 117.874855 116.368194 2306142.419355 137.783524 135.854397 2570180.857143 150.173397 148.012063 1972602.984127 166.874633 163.9689 3072533.633333
6 AMD Advanced Micro Devices Information Technology 2.672167 2.564667 18729917.45 3.521875 3.371563 28015006.546875 3.874219 3.744844 29267395.953125 3.633492 3.531429 24142526.84127 3.882131 3.758033 25269923.836066 4.116508 3.99381 21170687.333333 4.112656 3.992812 29876772.28125 2.809206 2.703492 18739877.619048 2.868279 2.74918 17290156.918033 2.463016 2.385556 14835027.238095 1.936641 1.833281 13248920.109375 2.268175 2.167619 9464912.730159 2.35041 2.198033 14099698.442623 3.985508 3.764922 22684549.421875 6.523773 6.237422 35252396.84375 8.582698 8.168543 46450313.365079 12.959565 12.396373 64543556.741935 12.506833 11.939441 76875665.444444 13.323238 12.81497 64954818.825397 12.990133 12.568997 61987566.333333
7 AES AES Corporation Utilities 10.10842 9.914168 7043960.75 11.204391 10.953047 6695538.890625 11.321672 11.107187 4528306.390625 12.618524 12.345254 6451758.952381 12.553852 12.295115 6094453.245902 12.804984 12.569683 6143209.666667 13.290375 13.076266 4156436.75 12.298159 11.993286 6739359.666667 11.363377 11.076623 7149936.786885 12.214016 11.966571 6735586.222222 11.136202 10.871042 6675684.125 9.324605 9.074429 6452733.015873 9.368844 9.050907 6921824.786885 10.665641 10.432906 5180647.8125 11.885609 11.656453 4655622.859375 11.352571 11.085476 5835446.761905 11.157871 10.906742 5824308.983871 11.298889 11.09727 5616984.539683 11.091317 10.909143 4938647.380952 10.9345 10.755767 5099342.833333
8 AFL Aflac Financials 46.2141 45.433267 4104190.6 49.176641 48.289062 3435438.578125 54.800359 54.012328 2189281.828125 59.970825 59.241762 2141384.761905 58.619787 57.822098 2175752.688525 57.771492 57.017746 1926278.460317 56.681 56.033297 2120122.046875 55.331175 54.628984 2444321.063492 57.613 56.80082 2244606.655738 60.131063 59.417365 2073244.571429 57.887156 56.950672 2387365.484375 59.969111 59.055286 2358668.0 57.645623 56.610869 2563169.836066 66.453531 65.644219 2310928.90625 71.264313 70.539875 1739223.609375 69.491175 68.60873 2020415.47619 70.262032 69.446823 2214819.548387 74.90873 74.134111 1846555.492063 80.695032 79.83694 1541004.079365 84.0718 83.197033 1440761.533333
9 A Agilent Technologies Health Care 29.667933 29.165533 5100460.5 30.035625 29.418578 5595136.578125 32.569641 32.047453 3832068.15625 36.757286 36.169667 3353795.761905 40.050869 39.277967 4170020.704918 39.052714 38.360952 2721890.15873 39.777484 39.240562 2585435.890625 39.484873 38.748032 3063606.825397 39.669721 38.975672 2532460.409836 40.752206 40.151476 2448139.761905 37.349875 36.627813 2521089.796875 38.163619 37.420571 3004962.285714 37.845246 36.91459 2357458.967213 42.781375 42.130453 2219362.46875 46.398047 45.748719 1877364.125 45.384619 44.604825 1990839.174603 50.474952 49.724629 2029887.822581 57.100619 56.334349 2009349.047619 62.56181 61.779984 1643051.47619 67.413667 66.701367 1397107.4
10 APD Air Products and Chemicals Materials 77.760883 76.798833 1093138.95 81.289891 80.249953 1552260.046875 92.838203 91.196234 2046325.296875 98.905492 97.454413 1143625.253968 104.973115 103.321607 1455185.934426 110.916032 109.299365 1199132.857143 122.2025 120.556875 1628685.890625 127.274921 125.009206 1710348.936508 139.970164 137.682459 1108369.032787 138.894444 136.88254 1078376.428571 130.466094 127.980312 1500850.15625 129.770635 127.481429 1433344.746032 125.588852 122.712131 1905698.672131 138.716563 136.732812 1064761.28125 145.268594 143.489688 1379288.796875 137.335556 135.394921 1452917.984127 139.034194 137.274355 1545373.048387 141.140476 139.533016 1303113.492063 146.273016 144.856825 1410870.507937 156.523133 155.0267 926571.433333
11 AKAM Akamai Information Technology 38.345833 37.470117 3942939.716667 42.344391 41.340719 2741468.78125 47.475531 46.609094 1724353.609375 47.71781 46.69873 2251961.84127 55.955328 54.503295 2719331.836066 56.288587 54.859762 2191404.68254 60.705016 59.619875 1624116.5 60.925302 59.595159 1622168.206349 66.694918 65.303443 1631151.639344 74.717937 73.624524 1419948.698413 72.898438 71.173375 1800375.140625 62.297746 60.930667 2270051.634921 51.614267 49.998952 2834324.114754 53.284647 51.860842 1531503.703125 54.033789 53.00923 1895174.203125 64.308589 62.977471 2097015.238095 66.29605 65.184232 1916476.16129 53.296676 52.291944 2627907.0 48.271071 47.413467 1919155.206349 52.275247 51.428977 2122564.7
12 ALB Albemarle Corporation Materials 59.422 58.35975 738925.166667 58.865891 57.647281 798707.59375 59.128953 58.218312 804109.703125 62.520635 61.596651 662333.936508 61.266967 60.274475 631124.557377 65.074079 64.164619 533168.222222 62.160953 60.983437 1437183.359375 56.594016 55.267 1462997.079365 52.807508 51.484361 2007614.065574 57.841937 56.750413 1095955.634921 47.790344 46.420312 1471876.21875 50.553 49.121714 1480702.650794 54.459115 52.543164 1433544.245902 72.837594 71.230234 1899285.328125 81.901984 80.286344 1283941.09375 84.576016 82.875873 888396.555556 97.426516 95.477742 1036087.532258 109.16 107.409206 1023728.301587 119.617524 117.308175 1430350.84127 140.3418 137.821167 1259730.2
13 ARE Alexandria Real Estate Equities Real Estate 61.36855 60.743033 510877.366667 61.040516 59.928797 812680.375 57.557531 56.515922 497802.3125 56.650143 55.566413 487156.761905 62.462639 61.485738 523381.442623 67.283714 66.416952 380025.349206 70.467438 69.669313 459933.390625 76.38346 75.279413 490275.349206 89.050443 87.467197 471547.639344 86.529873 85.126968 399307.920635 84.633359 83.013094 364752.734375 85.659873 84.143825 468813.809524 77.99 76.298508 515845.04918 91.893234 90.337047 623097.203125 105.790109 103.950516 587047.640625 105.292381 103.161841 609395.222222 111.310645 109.527903 1028053.66129 115.439048 113.911905 727220.412698 120.099048 118.52127 589167.52381 123.519667 122.114167 492825.366667
14 ALGN Align Technology Health Care 31.159017 30.377733 965606.483333 34.975797 33.909 932521.21875 44.089359 43.058906 623338.703125 54.480365 53.042889 1102201.47619 56.988443 55.11541 1269293.868852 52.782825 51.130048 1145288.68254 55.01075 53.947656 664095.84375 53.482111 52.269952 715800.730159 57.678639 56.164098 932696.57377 59.708095 58.407381 876989.904762 60.373438 58.926797 730518.59375 64.794143 63.197921 649318.666667 66.427911 64.275434 654232.180328 77.125 75.716822 578877.9375 91.181948 89.562223 596522.5 94.061211 91.813589 876983.809524 101.910419 99.875898 787098.354839 137.06419 134.170984 1255167.301587 172.816968 169.111603 868528.095238 213.796 208.346733 1115914.7
15 ALLE Allegion Industrials NaN NaN NaN NaN NaN NaN NaN NaN NaN 42.559722 41.557444 2326585.333333 49.590262 48.319852 1270741.557377 51.807556 50.721571 705595.587302 51.752297 50.791828 587033.5625 51.206841 50.206476 723700.015873 56.497984 55.437393 762305.081967 60.683825 59.811143 564986.84127 60.533687 59.331172 562724.65625 63.634683 62.399492 778905.380952 61.081 59.526738 912952.032787 66.168641 65.057437 652007.0 70.223922 69.284297 588783.5 66.34781 65.213746 800261.730159 70.690645 69.571823 775755.725806 78.576492 77.469111 747584.571429 81.224873 80.155159 517902.968254 86.157333 84.726927 796675.4
16 LNT Alliant Energy Utilities 19.833667 19.630533 1096029.6 21.7005 21.357781 1031537.703125 22.163375 21.863391 1133217.8125 22.626413 22.287175 1118667.571429 23.384557 23.072443 1122770.180328 25.851143 25.529857 1148605.238095 25.994484 25.654 1289851.59375 28.091159 27.61719 1222716.936508 29.901721 29.356951 1519714.639344 28.033143 27.640937 1334767.587302 27.474594 27.022156 1400986.171875 28.269857 27.814222 1324544.444444 32.027246 31.455148 1505386.147541 35.556656 34.981984 2127155.546875 37.931031 37.304672 1506166.765625 35.969651 35.375222 1294674.31746 37.607484 37.134 1213372.370968 39.966476 39.541794 1381354.412698 41.404048 40.971587 1011096.301587 43.097533 42.592293 1377366.533333
17 ALL Allstate Financials 41.775167 41.2432 3434803.983333 45.130828 44.478172 3923570.75 46.851547 46.229 2952061.890625 49.845937 49.225032 2502155.619048 50.444 49.757066 3070192.934426 54.400365 53.872794 2482041.920635 56.904359 56.361063 2327236.03125 62.651508 61.936 2418033.063492 67.630475 66.660803 2569832.672131 65.974413 65.215794 2616602.15873 60.50425 59.549359 3279574.0625 60.445508 59.56127 2817078.380952 61.341361 60.395262 2948406.852459 65.640938 64.841594 1997076.203125 67.744703 67.177312 1629849.046875 69.673952 68.879175 2349906.809524 78.158903 77.394903 1819662.048387 84.550587 83.749984 1847252.634921 91.21501 90.157444 1960922.936508 94.74633 93.664003 1809744.966667
18 GOOGL Alphabet Inc. (Class A) Communication Services 391.94 386.54 4710365.566667 427.770781 420.828594 4744065.375 446.389219 440.9575 3683857.9375 510.070476 503.04619 3619180.253968 589.497869 580.325738 4558310.721311 557.104603 545.903333 2446307.904762 593.085625 585.090781 1612151.828125 551.79746 542.204127 2198303.619048 546.473607 536.552951 2368245.360656 553.551905 545.467778 1767785.650794 651.789531 636.782187 2616696.96875 745.732619 731.616508 2054101.825397 745.071836 727.587508 2133765.016393 739.610859 728.390531 1371835.65625 785.743031 777.097047 1061037.15625 806.646873 793.98481 1732969.52381 844.654742 835.665145 1500393.129032 941.14981 928.679 1699425.68254 954.277016 941.745508 1649185.634921 1018.2223 1004.470233 1591438.266667
19 MO Altria Consumer Staples 27.804733 27.513033 13021175.25 30.020234 29.596156 11530296.125 29.783438 29.420891 10201472.640625 31.517444 31.145762 8860092.920635 31.564984 31.140131 10249242.163934 35.299429 34.885984 8557514.396825 38.049359 37.613687 7041493.1875 43.771159 43.163921 7621816.238095 48.308754 47.592033 8041661.639344 46.593317 45.944587 7200026.952381 49.801453 48.957937 7766711.328125 54.317063 53.491048 6346995.714286 57.344525 56.343689 7185249.213115 61.154234 60.290312 6438985.9375 64.032469 63.298719 5463859.625 62.576143 61.735317 6994040.809524 70.895935 70.206661 6767171.483871 72.274222 71.559794 5800049.904762 66.655587 65.611365 8136257.253968 64.755757 63.85896 6578395.933333

Next is the most important cleaning of the data: removing empty/null values. This section deletes any stock’s row that contains any NaNs. NaN values which were in this dataset means that we either had values of companies which weren’t created till after 2013 or shut down before 2017. This meant that these values would’ve skewed our analysis hence why we removed them.

In [ ]:
#Clean and preprocess the data to handle missing values and normalize features
#removing any tickers that have 1 or more NaN values in their data
cleaned_stocks_df = stocks_df.dropna()
cleaned_stocks_df.head(20)
Out[ ]:
Symbol Security GICS Sector 2013 Q1 Mean High 2013 Q1 Mean Low 2013 Q1 Mean Volume 2013 Q2 Mean High 2013 Q2 Mean Low 2013 Q2 Mean Volume 2013 Q3 Mean High 2013 Q3 Mean Low 2013 Q3 Mean Volume 2013 Q4 Mean High 2013 Q4 Mean Low 2013 Q4 Mean Volume 2014 Q1 Mean High 2014 Q1 Mean Low 2014 Q1 Mean Volume 2014 Q2 Mean High 2014 Q2 Mean Low 2014 Q2 Mean Volume 2014 Q3 Mean High 2014 Q3 Mean Low 2014 Q3 Mean Volume 2014 Q4 Mean High 2014 Q4 Mean Low 2014 Q4 Mean Volume 2015 Q1 Mean High 2015 Q1 Mean Low 2015 Q1 Mean Volume 2015 Q2 Mean High 2015 Q2 Mean Low 2015 Q2 Mean Volume 2015 Q3 Mean High 2015 Q3 Mean Low 2015 Q3 Mean Volume 2015 Q4 Mean High 2015 Q4 Mean Low 2015 Q4 Mean Volume 2016 Q1 Mean High 2016 Q1 Mean Low 2016 Q1 Mean Volume 2016 Q2 Mean High 2016 Q2 Mean Low 2016 Q2 Mean Volume 2016 Q3 Mean High 2016 Q3 Mean Low 2016 Q3 Mean Volume 2016 Q4 Mean High 2016 Q4 Mean Low 2016 Q4 Mean Volume 2017 Q1 Mean High 2017 Q1 Mean Low 2017 Q1 Mean Volume 2017 Q2 Mean High 2017 Q2 Mean Low 2017 Q2 Mean Volume 2017 Q3 Mean High 2017 Q3 Mean Low 2017 Q3 Mean Volume 2017 Q4 Mean High 2017 Q4 Mean Low 2017 Q4 Mean Volume
0 MMM 3M Industrials 91.312167 90.39165 3000908.633333 98.020156 96.832453 3184629.796875 105.455313 104.375 2593571.890625 115.978413 114.62381 3003412.984127 122.193607 120.529344 3286431.163934 129.457937 128.161111 2362018.47619 133.417656 132.191094 2302554.203125 143.35873 141.33619 3041470.253968 155.016885 152.888197 2458525.803279 151.794762 150.024921 2440807.730159 140.275156 138.233281 2956910.703125 147.416667 145.417778 2596462.174603 148.371311 146.186393 2821154.639344 163.714844 161.959375 1975933.640625 174.425313 172.915781 1730216.75 169.834127 167.954603 2021322.095238 181.455806 179.752903 1989174.451613 199.043492 197.367778 1849468.174603 207.525206 205.418159 1684750.68254 225.0516 222.543667 1925851.566667
1 AOS A. O. Smith Industrials 16.693183 16.394633 1580067.933333 17.946859 17.548625 1883052.6875 20.077312 19.714828 1230033.8125 24.618 24.195079 1645559.492063 23.568213 23.127361 1678852.622951 23.258302 22.849127 1349398.031746 23.570563 23.206109 1015275.21875 25.370365 24.94981 999006.349206 29.686967 29.156984 1118223.540984 33.707444 33.210063 1020983.460317 33.785922 32.879422 1674719.625 37.247635 36.402302 1528875.650794 34.733787 33.713066 1840718.47541 40.014906 39.273094 1472523.25 46.386844 45.709766 1067305.671875 48.150698 47.271032 1280703.47619 49.546 48.752032 919397.596774 54.049048 53.272238 869093.190476 56.258095 55.529968 1342885.873016 60.550513 59.777517 862284.733333
2 ABT Abbott Health Care 30.724817 30.309883 12451680.116667 33.482594 32.897703 7951001.546875 32.113391 31.673766 8005789.8125 33.834984 33.331286 7459989.047619 35.685213 35.102541 9174882.016393 36.46273 35.985317 6591339.587302 39.491984 39.013391 5035284.0 40.963206 40.289476 5394024.698413 43.385066 42.705344 5984359.327869 45.528857 44.942651 4981663.079365 45.127313 44.260953 7043142.28125 42.363111 41.620952 6665399.507937 38.360328 37.510508 8101788.295082 38.6665 38.062969 10643153.84375 41.879109 41.256594 9773374.203125 39.127889 38.410063 8882569.047619 42.683161 42.123548 8615529.145161 45.023984 44.466397 6880527.587302 50.187952 49.569937 5433541.793651 55.202797 54.498413 5491270.8
3 ABBV AbbVie Health Care 31.711379 30.96831 10897951.293103 37.874344 36.874609 7384147.078125 38.554094 37.840172 4690873.125 43.141349 42.268556 6319228.111111 44.859967 43.921311 7609473.032787 46.878524 45.901365 6642542.222222 49.877344 48.994797 13670690.390625 57.269476 55.990286 13088959.507937 55.40041 54.023197 13034415.47541 60.105444 59.026444 12052673.666667 60.615578 59.162406 10942893.65625 54.219746 52.613365 9892407.507937 53.032361 51.245393 9461727.672131 58.174437 56.981422 9176243.015625 62.272141 61.371672 7220892.84375 59.454603 58.236333 7871293.349206 61.587597 60.726435 7046768.645161 66.49673 65.606238 5782878.920635 76.087587 74.74454 6246636.222222 93.05493 90.88035 6171383.966667
4 ACN Accenture Information Technology 65.413133 64.46965 2958136.65 71.669828 70.472016 3545633.890625 66.856953 65.881375 3753502.28125 68.918381 67.920349 3565505.31746 75.105098 73.927754 3385196.557377 74.092238 73.077921 3025692.761905 74.035734 73.190891 2751022.140625 77.558571 76.471746 3074781.619048 83.717639 82.393557 2900945.442623 90.404254 89.257111 2328701.730159 94.127406 92.481 2749655.25 100.857413 99.433635 2597977.174603 98.467754 96.567508 3563175.213115 111.829062 110.387188 2252809.296875 110.0425 108.5325 2355634.875 116.823968 115.13 2508397.920635 117.692258 116.198548 2577157.33871 121.100476 119.760794 2231790.698413 129.643016 128.244762 1905023.0 140.0467 138.514 1750601.6
5 ADBE Adobe Inc. Information Technology 39.7829 39.11015 3569916.466667 44.708719 43.920484 3709883.078125 48.072375 47.381406 3170725.46875 55.64381 54.627508 3985544.47619 64.792295 63.390164 3924305.557377 65.134952 63.753222 3614820.587302 71.433422 70.293672 2636903.1875 70.770476 69.400349 3332754.111111 75.196279 73.791721 2366642.131148 78.606508 77.328 2462360.793651 81.957266 80.139375 3419304.015625 90.60054 88.735 2952103.492063 87.827552 85.323174 3681589.95082 96.7314 95.154775 2111794.109375 100.877748 99.587905 1807114.6875 106.782841 105.022873 2293033.888889 117.874855 116.368194 2306142.419355 137.783524 135.854397 2570180.857143 150.173397 148.012063 1972602.984127 166.874633 163.9689 3072533.633333
6 AMD Advanced Micro Devices Information Technology 2.672167 2.564667 18729917.45 3.521875 3.371563 28015006.546875 3.874219 3.744844 29267395.953125 3.633492 3.531429 24142526.84127 3.882131 3.758033 25269923.836066 4.116508 3.99381 21170687.333333 4.112656 3.992812 29876772.28125 2.809206 2.703492 18739877.619048 2.868279 2.74918 17290156.918033 2.463016 2.385556 14835027.238095 1.936641 1.833281 13248920.109375 2.268175 2.167619 9464912.730159 2.35041 2.198033 14099698.442623 3.985508 3.764922 22684549.421875 6.523773 6.237422 35252396.84375 8.582698 8.168543 46450313.365079 12.959565 12.396373 64543556.741935 12.506833 11.939441 76875665.444444 13.323238 12.81497 64954818.825397 12.990133 12.568997 61987566.333333
7 AES AES Corporation Utilities 10.10842 9.914168 7043960.75 11.204391 10.953047 6695538.890625 11.321672 11.107187 4528306.390625 12.618524 12.345254 6451758.952381 12.553852 12.295115 6094453.245902 12.804984 12.569683 6143209.666667 13.290375 13.076266 4156436.75 12.298159 11.993286 6739359.666667 11.363377 11.076623 7149936.786885 12.214016 11.966571 6735586.222222 11.136202 10.871042 6675684.125 9.324605 9.074429 6452733.015873 9.368844 9.050907 6921824.786885 10.665641 10.432906 5180647.8125 11.885609 11.656453 4655622.859375 11.352571 11.085476 5835446.761905 11.157871 10.906742 5824308.983871 11.298889 11.09727 5616984.539683 11.091317 10.909143 4938647.380952 10.9345 10.755767 5099342.833333
8 AFL Aflac Financials 46.2141 45.433267 4104190.6 49.176641 48.289062 3435438.578125 54.800359 54.012328 2189281.828125 59.970825 59.241762 2141384.761905 58.619787 57.822098 2175752.688525 57.771492 57.017746 1926278.460317 56.681 56.033297 2120122.046875 55.331175 54.628984 2444321.063492 57.613 56.80082 2244606.655738 60.131063 59.417365 2073244.571429 57.887156 56.950672 2387365.484375 59.969111 59.055286 2358668.0 57.645623 56.610869 2563169.836066 66.453531 65.644219 2310928.90625 71.264313 70.539875 1739223.609375 69.491175 68.60873 2020415.47619 70.262032 69.446823 2214819.548387 74.90873 74.134111 1846555.492063 80.695032 79.83694 1541004.079365 84.0718 83.197033 1440761.533333
9 A Agilent Technologies Health Care 29.667933 29.165533 5100460.5 30.035625 29.418578 5595136.578125 32.569641 32.047453 3832068.15625 36.757286 36.169667 3353795.761905 40.050869 39.277967 4170020.704918 39.052714 38.360952 2721890.15873 39.777484 39.240562 2585435.890625 39.484873 38.748032 3063606.825397 39.669721 38.975672 2532460.409836 40.752206 40.151476 2448139.761905 37.349875 36.627813 2521089.796875 38.163619 37.420571 3004962.285714 37.845246 36.91459 2357458.967213 42.781375 42.130453 2219362.46875 46.398047 45.748719 1877364.125 45.384619 44.604825 1990839.174603 50.474952 49.724629 2029887.822581 57.100619 56.334349 2009349.047619 62.56181 61.779984 1643051.47619 67.413667 66.701367 1397107.4
10 APD Air Products and Chemicals Materials 77.760883 76.798833 1093138.95 81.289891 80.249953 1552260.046875 92.838203 91.196234 2046325.296875 98.905492 97.454413 1143625.253968 104.973115 103.321607 1455185.934426 110.916032 109.299365 1199132.857143 122.2025 120.556875 1628685.890625 127.274921 125.009206 1710348.936508 139.970164 137.682459 1108369.032787 138.894444 136.88254 1078376.428571 130.466094 127.980312 1500850.15625 129.770635 127.481429 1433344.746032 125.588852 122.712131 1905698.672131 138.716563 136.732812 1064761.28125 145.268594 143.489688 1379288.796875 137.335556 135.394921 1452917.984127 139.034194 137.274355 1545373.048387 141.140476 139.533016 1303113.492063 146.273016 144.856825 1410870.507937 156.523133 155.0267 926571.433333
11 AKAM Akamai Information Technology 38.345833 37.470117 3942939.716667 42.344391 41.340719 2741468.78125 47.475531 46.609094 1724353.609375 47.71781 46.69873 2251961.84127 55.955328 54.503295 2719331.836066 56.288587 54.859762 2191404.68254 60.705016 59.619875 1624116.5 60.925302 59.595159 1622168.206349 66.694918 65.303443 1631151.639344 74.717937 73.624524 1419948.698413 72.898438 71.173375 1800375.140625 62.297746 60.930667 2270051.634921 51.614267 49.998952 2834324.114754 53.284647 51.860842 1531503.703125 54.033789 53.00923 1895174.203125 64.308589 62.977471 2097015.238095 66.29605 65.184232 1916476.16129 53.296676 52.291944 2627907.0 48.271071 47.413467 1919155.206349 52.275247 51.428977 2122564.7
12 ALB Albemarle Corporation Materials 59.422 58.35975 738925.166667 58.865891 57.647281 798707.59375 59.128953 58.218312 804109.703125 62.520635 61.596651 662333.936508 61.266967 60.274475 631124.557377 65.074079 64.164619 533168.222222 62.160953 60.983437 1437183.359375 56.594016 55.267 1462997.079365 52.807508 51.484361 2007614.065574 57.841937 56.750413 1095955.634921 47.790344 46.420312 1471876.21875 50.553 49.121714 1480702.650794 54.459115 52.543164 1433544.245902 72.837594 71.230234 1899285.328125 81.901984 80.286344 1283941.09375 84.576016 82.875873 888396.555556 97.426516 95.477742 1036087.532258 109.16 107.409206 1023728.301587 119.617524 117.308175 1430350.84127 140.3418 137.821167 1259730.2
13 ARE Alexandria Real Estate Equities Real Estate 61.36855 60.743033 510877.366667 61.040516 59.928797 812680.375 57.557531 56.515922 497802.3125 56.650143 55.566413 487156.761905 62.462639 61.485738 523381.442623 67.283714 66.416952 380025.349206 70.467438 69.669313 459933.390625 76.38346 75.279413 490275.349206 89.050443 87.467197 471547.639344 86.529873 85.126968 399307.920635 84.633359 83.013094 364752.734375 85.659873 84.143825 468813.809524 77.99 76.298508 515845.04918 91.893234 90.337047 623097.203125 105.790109 103.950516 587047.640625 105.292381 103.161841 609395.222222 111.310645 109.527903 1028053.66129 115.439048 113.911905 727220.412698 120.099048 118.52127 589167.52381 123.519667 122.114167 492825.366667
14 ALGN Align Technology Health Care 31.159017 30.377733 965606.483333 34.975797 33.909 932521.21875 44.089359 43.058906 623338.703125 54.480365 53.042889 1102201.47619 56.988443 55.11541 1269293.868852 52.782825 51.130048 1145288.68254 55.01075 53.947656 664095.84375 53.482111 52.269952 715800.730159 57.678639 56.164098 932696.57377 59.708095 58.407381 876989.904762 60.373438 58.926797 730518.59375 64.794143 63.197921 649318.666667 66.427911 64.275434 654232.180328 77.125 75.716822 578877.9375 91.181948 89.562223 596522.5 94.061211 91.813589 876983.809524 101.910419 99.875898 787098.354839 137.06419 134.170984 1255167.301587 172.816968 169.111603 868528.095238 213.796 208.346733 1115914.7
16 LNT Alliant Energy Utilities 19.833667 19.630533 1096029.6 21.7005 21.357781 1031537.703125 22.163375 21.863391 1133217.8125 22.626413 22.287175 1118667.571429 23.384557 23.072443 1122770.180328 25.851143 25.529857 1148605.238095 25.994484 25.654 1289851.59375 28.091159 27.61719 1222716.936508 29.901721 29.356951 1519714.639344 28.033143 27.640937 1334767.587302 27.474594 27.022156 1400986.171875 28.269857 27.814222 1324544.444444 32.027246 31.455148 1505386.147541 35.556656 34.981984 2127155.546875 37.931031 37.304672 1506166.765625 35.969651 35.375222 1294674.31746 37.607484 37.134 1213372.370968 39.966476 39.541794 1381354.412698 41.404048 40.971587 1011096.301587 43.097533 42.592293 1377366.533333
17 ALL Allstate Financials 41.775167 41.2432 3434803.983333 45.130828 44.478172 3923570.75 46.851547 46.229 2952061.890625 49.845937 49.225032 2502155.619048 50.444 49.757066 3070192.934426 54.400365 53.872794 2482041.920635 56.904359 56.361063 2327236.03125 62.651508 61.936 2418033.063492 67.630475 66.660803 2569832.672131 65.974413 65.215794 2616602.15873 60.50425 59.549359 3279574.0625 60.445508 59.56127 2817078.380952 61.341361 60.395262 2948406.852459 65.640938 64.841594 1997076.203125 67.744703 67.177312 1629849.046875 69.673952 68.879175 2349906.809524 78.158903 77.394903 1819662.048387 84.550587 83.749984 1847252.634921 91.21501 90.157444 1960922.936508 94.74633 93.664003 1809744.966667
18 GOOGL Alphabet Inc. (Class A) Communication Services 391.94 386.54 4710365.566667 427.770781 420.828594 4744065.375 446.389219 440.9575 3683857.9375 510.070476 503.04619 3619180.253968 589.497869 580.325738 4558310.721311 557.104603 545.903333 2446307.904762 593.085625 585.090781 1612151.828125 551.79746 542.204127 2198303.619048 546.473607 536.552951 2368245.360656 553.551905 545.467778 1767785.650794 651.789531 636.782187 2616696.96875 745.732619 731.616508 2054101.825397 745.071836 727.587508 2133765.016393 739.610859 728.390531 1371835.65625 785.743031 777.097047 1061037.15625 806.646873 793.98481 1732969.52381 844.654742 835.665145 1500393.129032 941.14981 928.679 1699425.68254 954.277016 941.745508 1649185.634921 1018.2223 1004.470233 1591438.266667
19 MO Altria Consumer Staples 27.804733 27.513033 13021175.25 30.020234 29.596156 11530296.125 29.783438 29.420891 10201472.640625 31.517444 31.145762 8860092.920635 31.564984 31.140131 10249242.163934 35.299429 34.885984 8557514.396825 38.049359 37.613687 7041493.1875 43.771159 43.163921 7621816.238095 48.308754 47.592033 8041661.639344 46.593317 45.944587 7200026.952381 49.801453 48.957937 7766711.328125 54.317063 53.491048 6346995.714286 57.344525 56.343689 7185249.213115 61.154234 60.290312 6438985.9375 64.032469 63.298719 5463859.625 62.576143 61.735317 6994040.809524 70.895935 70.206661 6767171.483871 72.274222 71.559794 5800049.904762 66.655587 65.611365 8136257.253968 64.755757 63.85896 6578395.933333
20 AMZN Amazon Consumer Discretionary 269.118 262.854 3522848.05 269.194062 263.381406 3039878.9375 300.553125 294.989531 2420409.5625 362.279683 354.804127 2940043.174603 375.857541 366.867541 3931548.016393 320.385238 311.777143 4657967.349206 335.560938 329.103437 3587100.65625 315.103968 307.935556 4204031.571429 355.356557 347.714426 3923801.163934 421.807937 414.614286 3037333.206349 512.614375 500.030937 4089669.734375 635.442857 621.514841 4296253.888889 576.212148 557.810295 4989470.04918 681.847781 670.028109 3135022.640625 769.382875 760.123859 2378967.3125 791.033063 777.629952 3940060.904762 836.813048 827.159419 3134660.548387 961.248302 946.97246 3873340.206349 989.650429 975.274317 3407886.507937 1038.752167 1022.620467 3624009.266667

For this tutorial’s purposes, we want just an average price for each stock for each quarter from 2014 to 2017. This block averages the quarterly mean high and mean low of each stock and places it into a new column just called Avg.

In [ ]:
#Clean the data so that we average the high and low of each quarter to get one fixed average stock price. 
years = list(range(2013, 2018))
quarters = ["Q1", "Q2", "Q3", "Q4"]

for year in years:
    for quarter in quarters:
        year_quarter = f"{year} {quarter}"
        high_col = f"{year_quarter} Mean High"
        low_col = f"{year_quarter} Mean Low"
        
        high = cleaned_stocks_df[high_col]
        low = cleaned_stocks_df[low_col]
        
        col_index = cleaned_stocks_df.columns.get_loc(high_col)
        cleaned_stocks_df.insert(col_index, f"{year_quarter} Avg", (high + low) / 2)
        
        cleaned_stocks_df = cleaned_stocks_df.drop([high_col, low_col], axis=1)
        
cleaned_stocks_df.head(20)
Out[ ]:
Symbol Security GICS Sector 2013 Q1 Avg 2013 Q1 Mean Volume 2013 Q2 Avg 2013 Q2 Mean Volume 2013 Q3 Avg 2013 Q3 Mean Volume 2013 Q4 Avg 2013 Q4 Mean Volume 2014 Q1 Avg 2014 Q1 Mean Volume 2014 Q2 Avg 2014 Q2 Mean Volume 2014 Q3 Avg 2014 Q3 Mean Volume 2014 Q4 Avg 2014 Q4 Mean Volume 2015 Q1 Avg 2015 Q1 Mean Volume 2015 Q2 Avg 2015 Q2 Mean Volume 2015 Q3 Avg 2015 Q3 Mean Volume 2015 Q4 Avg 2015 Q4 Mean Volume 2016 Q1 Avg 2016 Q1 Mean Volume 2016 Q2 Avg 2016 Q2 Mean Volume 2016 Q3 Avg 2016 Q3 Mean Volume 2016 Q4 Avg 2016 Q4 Mean Volume 2017 Q1 Avg 2017 Q1 Mean Volume 2017 Q2 Avg 2017 Q2 Mean Volume 2017 Q3 Avg 2017 Q3 Mean Volume 2017 Q4 Avg 2017 Q4 Mean Volume
0 MMM 3M Industrials 90.851908 3000908.633333 97.426305 3184629.796875 104.915156 2593571.890625 115.301111 3003412.984127 121.361475 3286431.163934 128.809524 2362018.47619 132.804375 2302554.203125 142.34746 3041470.253968 153.952541 2458525.803279 150.909841 2440807.730159 139.254219 2956910.703125 146.417222 2596462.174603 147.278852 2821154.639344 162.837109 1975933.640625 173.670547 1730216.75 168.894365 2021322.095238 180.604355 1989174.451613 198.205635 1849468.174603 206.471683 1684750.68254 223.797633 1925851.566667
1 AOS A. O. Smith Industrials 16.543908 1580067.933333 17.747742 1883052.6875 19.89607 1230033.8125 24.40654 1645559.492063 23.347787 1678852.622951 23.053714 1349398.031746 23.388336 1015275.21875 25.160087 999006.349206 29.421975 1118223.540984 33.458754 1020983.460317 33.332672 1674719.625 36.824968 1528875.650794 34.223426 1840718.47541 39.644 1472523.25 46.048305 1067305.671875 47.710865 1280703.47619 49.149016 919397.596774 53.660643 869093.190476 55.894032 1342885.873016 60.164015 862284.733333
2 ABT Abbott Health Care 30.51735 12451680.116667 33.190148 7951001.546875 31.893578 8005789.8125 33.583135 7459989.047619 35.393877 9174882.016393 36.224024 6591339.587302 39.252688 5035284.0 40.626341 5394024.698413 43.045205 5984359.327869 45.235754 4981663.079365 44.694133 7043142.28125 41.992032 6665399.507937 37.935418 8101788.295082 38.364734 10643153.84375 41.567852 9773374.203125 38.768976 8882569.047619 42.403355 8615529.145161 44.74519 6880527.587302 49.878944 5433541.793651 54.850605 5491270.8
3 ABBV AbbVie Health Care 31.339845 10897951.293103 37.374477 7384147.078125 38.197133 4690873.125 42.704952 6319228.111111 44.390639 7609473.032787 46.389944 6642542.222222 49.43607 13670690.390625 56.629881 13088959.507937 54.711803 13034415.47541 59.565944 12052673.666667 59.888992 10942893.65625 53.416556 9892407.507937 52.138877 9461727.672131 57.57793 9176243.015625 61.821906 7220892.84375 58.845468 7871293.349206 61.157016 7046768.645161 66.051484 5782878.920635 75.416063 6246636.222222 91.96764 6171383.966667
4 ACN Accenture Information Technology 64.941392 2958136.65 71.070922 3545633.890625 66.369164 3753502.28125 68.419365 3565505.31746 74.516426 3385196.557377 73.585079 3025692.761905 73.613313 2751022.140625 77.015159 3074781.619048 83.055598 2900945.442623 89.830683 2328701.730159 93.304203 2749655.25 100.145524 2597977.174603 97.517631 3563175.213115 111.108125 2252809.296875 109.2875 2355634.875 115.976984 2508397.920635 116.945403 2577157.33871 120.430635 2231790.698413 128.943889 1905023.0 139.28035 1750601.6
5 ADBE Adobe Inc. Information Technology 39.446525 3569916.466667 44.314602 3709883.078125 47.726891 3170725.46875 55.135659 3985544.47619 64.09123 3924305.557377 64.444087 3614820.587302 70.863547 2636903.1875 70.085413 3332754.111111 74.494 2366642.131148 77.967254 2462360.793651 81.04832 3419304.015625 89.66777 2952103.492063 86.575363 3681589.95082 95.943087 2111794.109375 100.232827 1807114.6875 105.902857 2293033.888889 117.121524 2306142.419355 136.81896 2570180.857143 149.09273 1972602.984127 165.421767 3072533.633333
6 AMD Advanced Micro Devices Information Technology 2.618417 18729917.45 3.446719 28015006.546875 3.809531 29267395.953125 3.58246 24142526.84127 3.820082 25269923.836066 4.055159 21170687.333333 4.052734 29876772.28125 2.756349 18739877.619048 2.80873 17290156.918033 2.424286 14835027.238095 1.884961 13248920.109375 2.217897 9464912.730159 2.274221 14099698.442623 3.875215 22684549.421875 6.380598 35252396.84375 8.375621 46450313.365079 12.677969 64543556.741935 12.223137 76875665.444444 13.069104 64954818.825397 12.779565 61987566.333333
7 AES AES Corporation Utilities 10.011294 7043960.75 11.078719 6695538.890625 11.21443 4528306.390625 12.481889 6451758.952381 12.424484 6094453.245902 12.687333 6143209.666667 13.18332 4156436.75 12.145722 6739359.666667 11.22 7149936.786885 12.090294 6735586.222222 11.003622 6675684.125 9.199517 6452733.015873 9.209875 6921824.786885 10.549273 5180647.8125 11.771031 4655622.859375 11.219024 5835446.761905 11.032306 5824308.983871 11.198079 5616984.539683 11.00023 4938647.380952 10.845133 5099342.833333
8 AFL Aflac Financials 45.823683 4104190.6 48.732852 3435438.578125 54.406344 2189281.828125 59.606294 2141384.761905 58.220943 2175752.688525 57.394619 1926278.460317 56.357148 2120122.046875 54.980079 2444321.063492 57.20691 2244606.655738 59.774214 2073244.571429 57.418914 2387365.484375 59.512198 2358668.0 57.128246 2563169.836066 66.048875 2310928.90625 70.902094 1739223.609375 69.049952 2020415.47619 69.854427 2214819.548387 74.521421 1846555.492063 80.265986 1541004.079365 83.634417 1440761.533333
9 A Agilent Technologies Health Care 29.416733 5100460.5 29.727102 5595136.578125 32.308547 3832068.15625 36.463476 3353795.761905 39.664418 4170020.704918 38.706833 2721890.15873 39.509023 2585435.890625 39.116452 3063606.825397 39.322697 2532460.409836 40.451841 2448139.761905 36.988844 2521089.796875 37.792095 3004962.285714 37.379918 2357458.967213 42.455914 2219362.46875 46.073383 1877364.125 44.994722 1990839.174603 50.09979 2029887.822581 56.717484 2009349.047619 62.170897 1643051.47619 67.057517 1397107.4
10 APD Air Products and Chemicals Materials 77.279858 1093138.95 80.769922 1552260.046875 92.017219 2046325.296875 98.179952 1143625.253968 104.147361 1455185.934426 110.107698 1199132.857143 121.379687 1628685.890625 126.142063 1710348.936508 138.826311 1108369.032787 137.888492 1078376.428571 129.223203 1500850.15625 128.626032 1433344.746032 124.150492 1905698.672131 137.724688 1064761.28125 144.379141 1379288.796875 136.365238 1452917.984127 138.154274 1545373.048387 140.336746 1303113.492063 145.564921 1410870.507937 155.774917 926571.433333
11 AKAM Akamai Information Technology 37.907975 3942939.716667 41.842555 2741468.78125 47.042313 1724353.609375 47.20827 2251961.84127 55.229311 2719331.836066 55.574175 2191404.68254 60.162445 1624116.5 60.26023 1622168.206349 65.99918 1631151.639344 74.17123 1419948.698413 72.035906 1800375.140625 61.614206 2270051.634921 50.80661 2834324.114754 52.572745 1531503.703125 53.521509 1895174.203125 63.64303 2097015.238095 65.740141 1916476.16129 52.79431 2627907.0 47.842269 1919155.206349 51.852112 2122564.7
12 ALB Albemarle Corporation Materials 58.890875 738925.166667 58.256586 798707.59375 58.673633 804109.703125 62.058643 662333.936508 60.770721 631124.557377 64.619349 533168.222222 61.572195 1437183.359375 55.930508 1462997.079365 52.145934 2007614.065574 57.296175 1095955.634921 47.105328 1471876.21875 49.837357 1480702.650794 53.501139 1433544.245902 72.033914 1899285.328125 81.094164 1283941.09375 83.725944 888396.555556 96.452129 1036087.532258 108.284603 1023728.301587 118.462849 1430350.84127 139.081483 1259730.2
13 ARE Alexandria Real Estate Equities Real Estate 61.055792 510877.366667 60.484656 812680.375 57.036727 497802.3125 56.108278 487156.761905 61.974189 523381.442623 66.850333 380025.349206 70.068375 459933.390625 75.831437 490275.349206 88.25882 471547.639344 85.828421 399307.920635 83.823227 364752.734375 84.901849 468813.809524 77.144254 515845.04918 91.115141 623097.203125 104.870312 587047.640625 104.227111 609395.222222 110.419274 1028053.66129 114.675476 727220.412698 119.310159 589167.52381 122.816917 492825.366667
14 ALGN Align Technology Health Care 30.768375 965606.483333 34.442398 932521.21875 43.574133 623338.703125 53.761627 1102201.47619 56.051926 1269293.868852 51.956437 1145288.68254 54.479203 664095.84375 52.876032 715800.730159 56.921369 932696.57377 59.057738 876989.904762 59.650117 730518.59375 63.996032 649318.666667 65.351673 654232.180328 76.420911 578877.9375 90.372086 596522.5 92.9374 876983.809524 100.893159 787098.354839 135.617587 1255167.301587 170.964286 868528.095238 211.071367 1115914.7
16 LNT Alliant Energy Utilities 19.7321 1096029.6 21.529141 1031537.703125 22.013383 1133217.8125 22.456794 1118667.571429 23.2285 1122770.180328 25.6905 1148605.238095 25.824242 1289851.59375 27.854175 1222716.936508 29.629336 1519714.639344 27.83704 1334767.587302 27.248375 1400986.171875 28.04204 1324544.444444 31.741197 1505386.147541 35.26932 2127155.546875 37.617852 1506166.765625 35.672437 1294674.31746 37.370742 1213372.370968 39.754135 1381354.412698 41.187817 1011096.301587 42.844913 1377366.533333
17 ALL Allstate Financials 41.509183 3434803.983333 44.8045 3923570.75 46.540273 2952061.890625 49.535484 2502155.619048 50.100533 3070192.934426 54.136579 2482041.920635 56.632711 2327236.03125 62.293754 2418033.063492 67.145639 2569832.672131 65.595103 2616602.15873 60.026805 3279574.0625 60.003389 2817078.380952 60.868311 2948406.852459 65.241266 1997076.203125 67.461008 1629849.046875 69.276563 2349906.809524 77.776903 1819662.048387 84.150286 1847252.634921 90.686227 1960922.936508 94.205167 1809744.966667
18 GOOGL Alphabet Inc. (Class A) Communication Services 389.24 4710365.566667 424.299688 4744065.375 443.673359 3683857.9375 506.558333 3619180.253968 584.911803 4558310.721311 551.503968 2446307.904762 589.088203 1612151.828125 547.000794 2198303.619048 541.513279 2368245.360656 549.509841 1767785.650794 644.285859 2616696.96875 738.674563 2054101.825397 736.329672 2133765.016393 734.000695 1371835.65625 781.420039 1061037.15625 800.315841 1732969.52381 840.159944 1500393.129032 934.914405 1699425.68254 948.011262 1649185.634921 1011.346267 1591438.266667
19 MO Altria Consumer Staples 27.658883 13021175.25 29.808195 11530296.125 29.602164 10201472.640625 31.331603 8860092.920635 31.352557 10249242.163934 35.092706 8557514.396825 37.831523 7041493.1875 43.46754 7621816.238095 47.950393 8041661.639344 46.268952 7200026.952381 49.379695 7766711.328125 53.904056 6346995.714286 56.844107 7185249.213115 60.722273 6438985.9375 63.665594 5463859.625 62.15573 6994040.809524 70.551298 6767171.483871 71.917008 5800049.904762 66.133476 8136257.253968 64.307358 6578395.933333
20 AMZN Amazon Consumer Discretionary 265.986 3522848.05 266.287734 3039878.9375 297.771328 2420409.5625 358.541905 2940043.174603 371.362541 3931548.016393 316.08119 4657967.349206 332.332188 3587100.65625 311.519762 4204031.571429 351.535492 3923801.163934 418.211111 3037333.206349 506.322656 4089669.734375 628.478849 4296253.888889 567.011221 4989470.04918 675.937945 3135022.640625 764.753367 2378967.3125 784.331508 3940060.904762 831.986234 3134660.548387 954.110381 3873340.206349 982.462373 3407886.507937 1030.686317 3624009.266667
In [ ]:
#Identify all GICS Sectors -> 11 sectors total
sectors = cleaned_stocks_df["GICS  Sector"]
sectors = list(sectors.sort_values().unique())
sectors
Out[ ]:
['Communication Services',
 'Consumer Discretionary',
 'Consumer Staples',
 'Energy',
 'Financials',
 'Health Care',
 'Industrials',
 'Information Technology',
 'Materials',
 'Real Estate',
 'Utilities']

Finally, after identifying every sector within this data set, we want to create a separate dataframe that contains the quarterly averages for each sector as a whole. This will eventually be useful because it will allow us to analyze and predict entire sectors’ performances.

In [ ]:
# high unemployment = low interest rates meaning look at faster increase in discretionary comsumer spending and real estate
# opposite for low unemployment plus a faster increase in healthcare/industrials/info tech 

sectors_avg_df = cleaned_stocks_df.loc[:, "GICS  Sector":].groupby("GICS  Sector").mean().loc[:, ::2].T

# Reformat index column to match the gdp_unemp_df quarter format
def reformat_quarter(original):
    original = original.split()
    new = f"{original[0]}{original[1]}"
    
    return new
    
sectors_avg_df.index = sectors_avg_df.index.map(reformat_quarter)
sectors_avg_df.columns.name = ""
sectors_avg_df.index.name = "Quarter"

sectors_avg_df
Out[ ]:
Communication Services Consumer Discretionary Consumer Staples Energy Financials Health Care Industrials Information Technology Materials Real Estate Utilities
Quarter
2013Q1 53.606051 83.967426 46.972873 51.025156 43.034767 58.227233 50.13672 37.150871 56.204277 67.174301 35.028688
2013Q2 61.00862 89.308101 51.391396 53.455384 46.718175 64.098781 53.109493 38.288068 57.968286 70.041769 37.733436
2013Q3 65.873971 93.510542 52.518399 56.484413 50.283462 68.028496 57.37949 40.657204 59.85785 68.106862 37.297389
2013Q4 72.531378 102.320923 54.638603 60.924815 54.157675 73.369455 62.558362 43.918417 63.990034 68.463233 37.789467
2014Q1 79.785127 111.72085 55.017444 63.643351 55.838741 79.980759 66.707765 48.093618 67.619884 72.223258 39.18897
2014Q2 78.849469 109.985092 57.418072 71.784604 56.229804 80.388476 69.673904 49.389855 70.97771 76.154216 42.677177
2014Q3 84.941217 115.342292 58.62104 75.521954 58.931846 86.80099 71.378148 52.955226 73.199359 79.7695 43.006188
2014Q4 82.047915 119.922544 62.203928 64.713406 61.576242 93.153398 73.974583 54.933903 71.670788 84.433212 46.418571
2015Q1 85.628869 127.960974 66.312432 61.002086 63.637636 101.919575 77.389209 59.238162 76.910675 93.024833 48.208254
2015Q2 89.74348 132.918902 67.514595 64.015145 65.984916 108.131197 78.462107 62.797285 80.508134 92.387045 45.882142
2015Q3 97.810651 140.813396 67.645306 54.929293 65.201938 108.228322 74.881097 60.153303 75.929594 92.55864 45.278884
2015Q4 105.505572 145.419427 70.150361 53.150133 66.175417 104.478399 75.051292 61.515457 74.955804 98.474586 46.425979
2016Q1 102.474182 137.820155 72.457023 45.596145 60.995942 98.875879 72.438275 56.736884 71.177349 99.041406 50.134951
2016Q2 106.637626 146.901684 76.714013 53.503052 66.218309 106.447697 80.129022 60.549082 82.362036 107.645605 54.423475
2016Q3 112.92182 151.291667 79.984418 55.962611 69.761744 113.4712 84.758315 64.378083 85.308012 112.820678 56.810371
2016Q4 117.799688 148.332804 75.535874 60.096161 74.302982 108.67958 89.435799 68.264731 85.779932 104.352883 53.869594
2017Q1 128.686692 160.978991 77.995511 60.850663 81.860864 116.30322 97.096549 74.68916 93.027048 109.16407 57.069938
2017Q2 139.104742 176.364536 80.152594 56.972089 83.987991 127.73827 99.816145 80.491237 97.689698 115.642488 60.883284
2017Q3 145.288521 180.998835 79.467716 56.349285 89.211384 136.841759 104.330158 85.088161 100.97046 120.344693 63.092819
2017Q4 149.647204 193.286556 79.599355 60.289606 94.736015 140.991576 112.327765 91.978299 108.369587 123.064697 64.680491

Exploratory Analysis and Data Visualization¶

Initial Visualization¶

5 of the 11 sectors in the S&P 500 are known to fluctuate the most based on unemployment rates. These sectors include Consumer Discretionary, Real Estate, Health Care, Industrials, Information Technology. Our first visualization shows these 5 sectors performance over time. Next is a visualization of natural and underlying long term unemployment over time. We initially did these separately to just look at their general direction of growth alone before we analyzed their effects on each other. More information about these connections can be found here and here.

In [ ]:
sectors_gdp_unemp_df = pd.merge(sectors_avg_df, gdp_unemp_df, left_index=True, right_index=True)

unemp_sectors_ax = sectors_gdp_unemp_df[["Consumer Discretionary", "Real Estate", "Health Care", "Industrials", "Information Technology"]].plot(figsize=(10,6))
unemp_sectors_ax.set_xticks(range(len(sectors_gdp_unemp_df)))
unemp_sectors_ax.set_xticklabels(sectors_gdp_unemp_df.index.tolist(), rotation=90)

unemp_ax = sectors_gdp_unemp_df[["Unemployment Underlying Long Term", "Unemployment Natural"]].plot(figsize=(10,6))
unemp_ax.set_xticks(range(len(sectors_gdp_unemp_df)))
unemp_ax.set_xticklabels(sectors_gdp_unemp_df.index.tolist(), rotation=90)
Out[ ]:
[Text(0, 0, '2013Q1'),
 Text(1, 0, '2013Q2'),
 Text(2, 0, '2013Q3'),
 Text(3, 0, '2013Q4'),
 Text(4, 0, '2014Q1'),
 Text(5, 0, '2014Q2'),
 Text(6, 0, '2014Q3'),
 Text(7, 0, '2014Q4'),
 Text(8, 0, '2015Q1'),
 Text(9, 0, '2015Q2'),
 Text(10, 0, '2015Q3'),
 Text(11, 0, '2015Q4'),
 Text(12, 0, '2016Q1'),
 Text(13, 0, '2016Q2'),
 Text(14, 0, '2016Q3'),
 Text(15, 0, '2016Q4'),
 Text(16, 0, '2017Q1'),
 Text(17, 0, '2017Q2'),
 Text(18, 0, '2017Q3'),
 Text(19, 0, '2017Q4')]
No description has been provided for this image
No description has been provided for this image

We then combined these 2 graphs to analyze the economic factors effects on those 5 GICS Sectors from the S&P 500. What we can see here is with higher unemployment rates, we will naturally have lower interest rates. By doing so we will see a faster increase in growth for consumer spending, real estate markets, and information technology. However, the opposite happens for the healthcare and industrial sectors. These sectors as the unemployment decreases tend to have a faster growth and as it starts to plateau or even increase the growth rate for those 2 sectors becomes slower.

In [ ]:
#Checking industries which would be affected by Unemployment with the Unemployment Natural. 
scaler = MinMaxScaler()
norm_columns = sectors_gdp_unemp_df.columns[1:]
df_normalized = pd.DataFrame(scaler.fit_transform(sectors_gdp_unemp_df[norm_columns]), columns=norm_columns)
df_normalized
unemp_ax = df_normalized[["Consumer Discretionary", "Real Estate", "Health Care", "Industrials", "Information Technology", "Unemployment Natural"]].plot(figsize=(10,6))
unemp_ax.set_xticks(range(len(df_normalized)))
unemp_ax.set_xticklabels(df_normalized.index.tolist(), rotation=90)
Out[ ]:
[Text(0, 0, '0'),
 Text(1, 0, '1'),
 Text(2, 0, '2'),
 Text(3, 0, '3'),
 Text(4, 0, '4'),
 Text(5, 0, '5'),
 Text(6, 0, '6'),
 Text(7, 0, '7'),
 Text(8, 0, '8'),
 Text(9, 0, '9'),
 Text(10, 0, '10'),
 Text(11, 0, '11'),
 Text(12, 0, '12'),
 Text(13, 0, '13'),
 Text(14, 0, '14'),
 Text(15, 0, '15'),
 Text(16, 0, '16'),
 Text(17, 0, '17'),
 Text(18, 0, '18'),
 Text(19, 0, '19')]
No description has been provided for this image

3 of the 11 sectors in the S&P 500 are known to fluctuate the most based on unemployment rates. These sectors include Energy, Materials, and Industrials. Our first visualization shows these 3 sectors performance over time. Then, a visualization of Real GDP adjusted for inflation as well as Nominal GDP over time.

In [ ]:
# looking at gdp down energy/industrials/materials should be down too. gdp down means people abroad are doing better than us.
gdp_sectors_ax = sectors_gdp_unemp_df[["Energy", "Materials", "Industrials"]].plot(figsize=(10,6))
gdp_sectors_ax.set_xticks(range(len(sectors_gdp_unemp_df)))
gdp_sectors_ax.set_xticklabels(sectors_gdp_unemp_df.index.tolist(), rotation=90)

gdp_ax = sectors_gdp_unemp_df[["Real GDP", "Nominal GDP"]].plot(figsize=(10,6))
gdp_ax.set_xticks(range(len(sectors_gdp_unemp_df)))
gdp_ax.set_xticklabels(sectors_gdp_unemp_df.index.tolist(), rotation=90)
Out[ ]:
[Text(0, 0, '2013Q1'),
 Text(1, 0, '2013Q2'),
 Text(2, 0, '2013Q3'),
 Text(3, 0, '2013Q4'),
 Text(4, 0, '2014Q1'),
 Text(5, 0, '2014Q2'),
 Text(6, 0, '2014Q3'),
 Text(7, 0, '2014Q4'),
 Text(8, 0, '2015Q1'),
 Text(9, 0, '2015Q2'),
 Text(10, 0, '2015Q3'),
 Text(11, 0, '2015Q4'),
 Text(12, 0, '2016Q1'),
 Text(13, 0, '2016Q2'),
 Text(14, 0, '2016Q3'),
 Text(15, 0, '2016Q4'),
 Text(16, 0, '2017Q1'),
 Text(17, 0, '2017Q2'),
 Text(18, 0, '2017Q3'),
 Text(19, 0, '2017Q4')]
No description has been provided for this image
No description has been provided for this image

This combined visualization of the adjusted GDP and the typical sectors it affects shows a general positive correlation between the sectors and GDP. The trend should be as GDP is down, energy, industrials, and materials should also be down (in other words this is a linear relationship). A notable exception is the Energy sector. The reason the Energy sector is an exception to this trend is because during mid 2014 and early 2016 where the entire oil industry (which holds a large stake in the energy sector) plummeted. This skewed the data for the Energy sector, but its important to note still since it shows a good example of other factors creating shifts in the market. Article about the oil prices dropping linked here.

In [ ]:
#Checking industries which would be effected by GDP with the nominal GDP. 
nat_unemp_ax = df_normalized[["Energy", "Materials", "Industrials", "Nominal GDP"]].plot(figsize=(10,6))
nat_unemp_ax.set_xticks(range(len(df_normalized)))
nat_unemp_ax.set_xticklabels(df_normalized.index.tolist(), rotation=90)
Out[ ]:
[Text(0, 0, '0'),
 Text(1, 0, '1'),
 Text(2, 0, '2'),
 Text(3, 0, '3'),
 Text(4, 0, '4'),
 Text(5, 0, '5'),
 Text(6, 0, '6'),
 Text(7, 0, '7'),
 Text(8, 0, '8'),
 Text(9, 0, '9'),
 Text(10, 0, '10'),
 Text(11, 0, '11'),
 Text(12, 0, '12'),
 Text(13, 0, '13'),
 Text(14, 0, '14'),
 Text(15, 0, '15'),
 Text(16, 0, '16'),
 Text(17, 0, '17'),
 Text(18, 0, '18'),
 Text(19, 0, '19')]
No description has been provided for this image

As an added bonus for visualization, we decided to show each sector’s normalized individual company performance.

In [ ]:
norm_cleaned_stocks_df = cleaned_stocks_df
quarterly_avg_cols = [col for col in cleaned_stocks_df.columns if 'Avg' in col]
scaler = MinMaxScaler()
norm_cleaned_stocks_df[quarterly_avg_cols] = scaler.fit_transform(cleaned_stocks_df[quarterly_avg_cols])
for sector in norm_cleaned_stocks_df['GICS  Sector'].unique():
    sector_data = norm_cleaned_stocks_df[norm_cleaned_stocks_df['GICS  Sector'] == sector]

    fig, ax = plt.subplots(figsize=(15, 8))

    # Plot each company's data points within the sector
    for company in sector_data.index:
        company_data = sector_data.loc[company, quarterly_avg_cols]
        ax.plot(quarterly_avg_cols, company_data, label=sector_data.loc[company, 'Symbol'])

    ax.set_xticks(range(len(quarterly_avg_cols)))
    ax.set_xticklabels(quarterly_avg_cols, rotation=90)

    ax.set_xlabel("Quarter")
    ax.set_ylabel("Normalized Value")
    ax.set_title(f"Performance of Companies in {sector} Sector Over Time")
    
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Linear Regression¶

This visualization serves to display the general trend of each sector as a whole over the course of this tutorial’s scope. A more in-depth linear regression of the data is displayed later in this tutorial, but this representation is helpful just to show basic sector performance in relation to our economic factors.

In [ ]:
#linear regression of the cleaned S&P500 by industry (GICS Cateogry)
sectors_reset_index = sectors_gdp_unemp_df.reset_index()
fig, ax = plt.subplots(figsize=(20, 12))

# Plot each sector
for sector in sectors:
    ax.scatter(sectors_reset_index["Quarter"], sectors_reset_index[sector], label=sector)

    # Fit linear regression
    X = np.arange(len(sectors_reset_index)).reshape(-1, 1)
    y = sectors_reset_index[sector].values.reshape(-1, 1)
    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)

    ax.plot(sectors_reset_index["Quarter"], y_pred, label=f'{sector} Trend', linestyle='--')

ax.set_xticks(range(len(sectors_reset_index)))
ax.set_xticklabels(sectors_reset_index["Quarter"], rotation=90)

ax.set_xlabel("Quarter")
ax.set_ylabel("Value")
ax.set_title("Sector Performance Over Time with Linear Regression")
ax.legend()

plt.show()
No description has been provided for this image

Explaining the Relevance of a Hypothesis Test¶

Our hypothesis is that there is a linear relationship between our economic factors and the individual sectors in the S&P 500. Using the below formulas, if the calculated p-value is below 0.05, then it is quite likely that there is a linear relationship between the economic factors and the sectors of the S&P 500

$$ \text{variance} = \text{MSE} \times \left( \text{diag}\left( \left( X^T X \right)^{-1} \right) \right) $$ $$ \text{standard\_errors} = \sqrt{\text{variance}} $$ $$ \text{tvalues} = \frac{\text{model.coef}}{\text{standard\_errors}} $$ $$ \text{p\_values} = [2 \times (1 - \text{stats.t.cdf}(|t|, n-k-1)) \text{ for } t \text{ in } \text{t\_values}] $$

Correlation Analysis¶

Our next piece of data analysis includes a correlation matrix detailing the strength of correlation between sectors’ performances, gdp, and unemployment. The matrix also indicates a positive or negative correlation (-1 being the strongest negative correlation and 1 being the strongest positive correlation). For example, we can see a strong negative correlation between unemployment rates and the performance of most sectors. We can also see a strong positive correlation between the GDP and sector performance. Energy showed a poor correlation because of what was discussed about skewed data, above.

Seaborn visualization library found here

To learn more about correlation analysis and regression click here

In [ ]:
#Lecture 9 (correlation between 2 varialbes) 
df_reset = sectors_gdp_unemp_df.reset_index()
numeric_df = df_reset.drop(columns=['Quarter'], errors='ignore')
correlation_matrix = numeric_df.corr()
print(correlation_matrix)
plt.figure(figsize=(14, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=.5)
plt.title('Correlation Matrix of Economic Indicators and Stock Performance')
plt.show()
                                   Communication Services  \
Communication Services                           1.000000   
Consumer Discretionary                           0.987924   
Consumer Staples                                 0.935060   
Energy                                          -0.101493   
Financials                                       0.978598   
Health Care                                      0.959232   
Industrials                                      0.971948   
Information Technology                           0.975336   
Materials                                        0.967361   
Real Estate                                      0.957317   
Utilities                                        0.967086   
Real GDP                                         0.981291   
Nominal GDP                                      0.987339   
Unemployment Underlying Long Term               -0.874037   
Unemployment Natural                            -0.805601   

                                   Consumer Discretionary  Consumer Staples  \
Communication Services                           0.987924          0.935060   
Consumer Discretionary                           1.000000          0.949784   
Consumer Staples                                 0.949784          1.000000   
Energy                                          -0.115463         -0.195799   
Financials                                       0.975527          0.894617   
Health Care                                      0.986517          0.942485   
Industrials                                      0.969548          0.901185   
Information Technology                           0.982833          0.903430   
Materials                                        0.969203          0.898534   
Real Estate                                      0.972393          0.981577   
Utilities                                        0.965671          0.953500   
Real GDP                                         0.983483          0.980323   
Nominal GDP                                      0.986387          0.970951   
Unemployment Underlying Long Term               -0.910904         -0.966750   
Unemployment Natural                            -0.848124         -0.897720   

                                     Energy  Financials  Health Care  \
Communication Services            -0.101493    0.978598     0.959232   
Consumer Discretionary            -0.115463    0.975527     0.986517   
Consumer Staples                  -0.195799    0.894617     0.942485   
Energy                             1.000000    0.012237    -0.057815   
Financials                         0.012237    1.000000     0.967304   
Health Care                       -0.057815    0.967304     1.000000   
Industrials                        0.065185    0.994526     0.964797   
Information Technology            -0.008027    0.994420     0.980619   
Materials                          0.056745    0.989092     0.965655   
Real Estate                       -0.213068    0.927934     0.962079   
Utilities                         -0.113515    0.952533     0.951182   
Real GDP                          -0.155328    0.952111     0.966176   
Nominal GDP                       -0.114904    0.964317     0.968687   
Unemployment Underlying Long Term  0.194390   -0.833426    -0.919999   
Unemployment Natural              -0.054436   -0.785732    -0.877693   

                                   Industrials  Information Technology  \
Communication Services                0.971948                0.975336   
Consumer Discretionary                0.969548                0.982833   
Consumer Staples                      0.901185                0.903430   
Energy                                0.065185               -0.008027   
Financials                            0.994526                0.994420   
Health Care                           0.964797                0.980619   
Industrials                           1.000000                0.990682   
Information Technology                0.990682                1.000000   
Materials                             0.994519                0.989337   
Real Estate                           0.929761                0.941982   
Utilities                             0.961877                0.955645   
Real GDP                              0.952584                0.957551   
Nominal GDP                           0.965429                0.967403   
Unemployment Underlying Long Term    -0.836263               -0.855386   
Unemployment Natural                 -0.804078               -0.803824   

                                   Materials  Real Estate  Utilities  \
Communication Services              0.967361     0.957317   0.967086   
Consumer Discretionary              0.969203     0.972393   0.965671   
Consumer Staples                    0.898534     0.981577   0.953500   
Energy                              0.056745    -0.213068  -0.113515   
Financials                          0.989092     0.927934   0.952533   
Health Care                         0.965655     0.962079   0.951182   
Industrials                         0.994519     0.929761   0.961877   
Information Technology              0.989337     0.941982   0.955645   
Materials                           1.000000     0.935021   0.962263   
Real Estate                         0.935021     1.000000   0.979769   
Utilities                           0.962263     0.979769   1.000000   
Real GDP                            0.946958     0.985343   0.974638   
Nominal GDP                         0.959721     0.979992   0.977064   
Unemployment Underlying Long Term  -0.828671    -0.931752  -0.869724   
Unemployment Natural               -0.784707    -0.840642  -0.800247   

                                   Real GDP  Nominal GDP  \
Communication Services             0.981291     0.987339   
Consumer Discretionary             0.983483     0.986387   
Consumer Staples                   0.980323     0.970951   
Energy                            -0.155328    -0.114904   
Financials                         0.952111     0.964317   
Health Care                        0.966176     0.968687   
Industrials                        0.952584     0.965429   
Information Technology             0.957551     0.967403   
Materials                          0.946958     0.959721   
Real Estate                        0.985343     0.979992   
Utilities                          0.974638     0.977064   
Real GDP                           1.000000     0.998468   
Nominal GDP                        0.998468     1.000000   
Unemployment Underlying Long Term -0.940251    -0.926316   
Unemployment Natural              -0.866446    -0.859056   

                                   Unemployment Underlying Long Term  \
Communication Services                                     -0.874037   
Consumer Discretionary                                     -0.910904   
Consumer Staples                                           -0.966750   
Energy                                                      0.194390   
Financials                                                 -0.833426   
Health Care                                                -0.919999   
Industrials                                                -0.836263   
Information Technology                                     -0.855386   
Materials                                                  -0.828671   
Real Estate                                                -0.931752   
Utilities                                                  -0.869724   
Real GDP                                                   -0.940251   
Nominal GDP                                                -0.926316   
Unemployment Underlying Long Term                           1.000000   
Unemployment Natural                                        0.951430   

                                   Unemployment Natural  
Communication Services                        -0.805601  
Consumer Discretionary                        -0.848124  
Consumer Staples                              -0.897720  
Energy                                        -0.054436  
Financials                                    -0.785732  
Health Care                                   -0.877693  
Industrials                                   -0.804078  
Information Technology                        -0.803824  
Materials                                     -0.784707  
Real Estate                                   -0.840642  
Utilities                                     -0.800247  
Real GDP                                      -0.866446  
Nominal GDP                                   -0.859056  
Unemployment Underlying Long Term              0.951430  
Unemployment Natural                           1.000000  
No description has been provided for this image

Analysis, Hypothesis Testing, and Machine Learning¶

Our Model Choice Analysis¶

When deciding on which model to use we took into consideration KNN, SVM, SGD, CART, Random Forest, and a Linear/Multivariate Regression. Our analysis for choosing is below.

  1. K-Nearest Neighbors (KNN)

    Strengths:

     Simplicity: Easy to understand
    
     Data Size: Works well with small to medium datasets
    
     Interpretability: Easy to interpret and visualize

    Weaknesses:

     Not Scalable: Not suitable for large datasets because it requires significant memory and computation power
    
     Sensitivity: Performance can be decreased by irrelevant data features
    
     Distance Metric: Very dependent on the choice of distance metric, meaning the wrong metric can ruin the whole model
  2. Support Vector Machines (SVM)

    Strengths:

     Effective with Many Features: Works well when the number of dimensions/features is greater than the number of samples.
    
     Memory Efficient
    
     Robustness: Because it is effective when the number of features exceeds the number of samples, it is a very consistent and robust model

    Weaknesses:

     Training Time: Computationally intensive and time-consuming for large datasets
    
     Choice of Kernel: Requires careful tuning of the kernel and regularization parameter
    
     Lack of Interpretability: Harder to interpret the model compared to simpler ones like linear regression
  3. Stochastic Gradient Descent (SGD)

    Strengths:

     Efficiency: Suitable for a larger scale and very efficient when dealing with large datasets
    
     Online Learning: Capable of updating the model incrementally as new data arrives. Perfect in the long run for stock and economic data
    
     Flexibility: Can be used with a variety of loss functions

    Weaknesses:

     Convergence Issues: Can converge to less than optimal solutions or diverge if the learning rate is not properly tuned
    
     Needs to be Normalized: The performance is sensitive to the scaling of the input features.
  4. Classification and Regression Trees (CART)

    Strengths:

     Interpretability: Easy to interpret and visualize.
    
     Non-linearity: Can capture nonlinear relationships between features
    
     No Need for Normalization

    Weaknesses:

     Overfitting: Prone to overfitting
    
     Instability: Small changes in the data can result in a completely different tree.
    
     Bias-Variance Tradeoff: Requires careful tuning of tree depth to balance bias and variance.
  1. Random Forest

    Strengths:

     Robustness: Reduces overfitting by averaging multiple decision trees
    
     Handles Missing Data: Can handle missing values effectively if not cleaned (irrelevant to this, though)
    
     Versatility: Works well with both classification and regression

    Weaknesses:

     Complexity: Harder to interpret compared to a single decision tree
    
     Computationally Intensive: Requires significant computational resources for large datasets
  2. Linear/Multivariate Regression

    Strengths:

     Simplicity: Easiest by far interpret and visualize
    
     Efficiency: Computationally efficient for small to medium-sized datasets.
    
     Assumptions: Works well when the relationship between features and target is approximately linear

    Weaknesses:

     Linearity: Assumes a linear relationship between the features and target, which is not always the case
    
     Outliers: Sensitive to outliers, which can skew the results

Further info about many of these models can be found here.

Comparing all of these strengths and weaknesses led us to conclude that the best models for our data set, taking into account its size and structure(number of features and data points), would be a Linear Regression, Random Forest Model, and Stochastic Gradient Descent. Each model covers a different assumption of the data and we believe that this group of 3 covers as many pertinent perspectives with as little redundancy as possible.

Linear Regression¶

We chose to continue with a more in depth trained and tested linear regression model because based on the initial linear regression and the correlation analysis, it was clear that the sectors are linearly related to each other to some degree (i.e. a vast majority of the numbers in the correlation heatmap are near 1 or -1).

In [ ]:
#make this into a function so it can be used for everything else
def LinReg(size, sector):
    X = sectors_gdp_unemp_df.drop(columns=[sector])
    y = sectors_gdp_unemp_df[sector]

    # Split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=size, random_state=42)

    # Normalize
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Train
    lin_reg = LinearRegression()
    lin_reg.fit(X_train_scaled, y_train)

    # Predictions and evaluation
    y_pred = lin_reg.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)
    print(f'Mean Squared Error: {mse}')

    # Plot prediction vs actual
    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred, edgecolor='k', alpha=0.7)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title('Predicted vs. Actual Values for ' + sector)
    plt.grid(True)
    plt.show()
In [ ]:
LinReg(0.2, "Consumer Discretionary")
Mean Squared Error: 30.024296579397454
No description has been provided for this image
In [ ]:
LinReg(0.4, "Real Estate")
Mean Squared Error: 3.7438870909191553
No description has been provided for this image
In [ ]:
LinReg(0.1, "Health Care")
Mean Squared Error: 4.564097219785389
No description has been provided for this image
In [ ]:
LinReg(0.2, "Industrials") 
Mean Squared Error: 6.675113759988081
No description has been provided for this image
In [ ]:
LinReg(0.1, "Information Technology")
Mean Squared Error: 0.6960937172195818
No description has been provided for this image
In [ ]:
LinReg(0.2, "Energy")
Mean Squared Error: 9.589620069450447
No description has been provided for this image
In [ ]:
LinReg(0.2, "Materials")
Mean Squared Error: 1.9131354533793314
No description has been provided for this image

Random Forest¶

We chose to also use a Random Forest Model for each sector because of its robust nature and ability to handle large amounts of features without overfitting. After running the random forest on each sector individually, we found that our model is between 87.5% and 100% accurate, depending on the sector. It is likely, however, that the 100% accurate models may have been mildly skewed by the lower amount of data in that sector.

In [ ]:
def RandForest(size, sector):
    X = sectors_gdp_unemp_df.drop(columns=[sector])
    y = sectors_gdp_unemp_df[sector]

    median_value = y.median()
    y_binary = (y > median_value).astype(int)

    # Split
    X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=size, random_state=42)

    # Normalize
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    rf_clf = RandomForestClassifier(random_state=42)
    rf_clf.fit(X_train_scaled, y_train)

    # Predict and evaluate
    y_pred = rf_clf.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    conf_matrix = confusion_matrix(y_test, y_pred)
    class_report = classification_report(y_test, y_pred)

    print(f'Accuracy: {accuracy}')
    print(f'Precision: {precision}')
    print(f'Recall: {recall}')
    print(f'F1-Score: {f1}')
    print(f'Confusion Matrix:\n{conf_matrix}')
    print(f'Classification Report:\n{class_report}')

    # Confusion matrix
    plt.figure(figsize=(8, 6))
    sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix for ' + sector)
    plt.show()
In [ ]:
RandForest(0.4, "Consumer Discretionary")
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0
Confusion Matrix:
[[5 0]
 [0 3]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         5
           1       1.00      1.00      1.00         3

    accuracy                           1.00         8
   macro avg       1.00      1.00      1.00         8
weighted avg       1.00      1.00      1.00         8

No description has been provided for this image
In [ ]:
RandForest(0.75, "Real Estate")
Accuracy: 0.9333333333333333
Precision: 1.0
Recall: 0.875
F1-Score: 0.9333333333333333
Confusion Matrix:
[[7 0]
 [1 7]]
Classification Report:
              precision    recall  f1-score   support

           0       0.88      1.00      0.93         7
           1       1.00      0.88      0.93         8

    accuracy                           0.93        15
   macro avg       0.94      0.94      0.93        15
weighted avg       0.94      0.93      0.93        15

No description has been provided for this image
In [ ]:
RandForest(0.8, "Health Care")
Accuracy: 0.9375
Precision: 0.8888888888888888
Recall: 1.0
F1-Score: 0.9411764705882353
Confusion Matrix:
[[7 1]
 [0 8]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.88      0.93         8
           1       0.89      1.00      0.94         8

    accuracy                           0.94        16
   macro avg       0.94      0.94      0.94        16
weighted avg       0.94      0.94      0.94        16

No description has been provided for this image
In [ ]:
RandForest(0.5, "Industrials")
Accuracy: 0.9
Precision: 1.0
Recall: 0.8333333333333334
F1-Score: 0.9090909090909091
Confusion Matrix:
[[4 0]
 [1 5]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      1.00      0.89         4
           1       1.00      0.83      0.91         6

    accuracy                           0.90        10
   macro avg       0.90      0.92      0.90        10
weighted avg       0.92      0.90      0.90        10

No description has been provided for this image
In [ ]:
RandForest(0.7, "Information Technology")
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0
Confusion Matrix:
[[6 0]
 [0 8]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         6
           1       1.00      1.00      1.00         8

    accuracy                           1.00        14
   macro avg       1.00      1.00      1.00        14
weighted avg       1.00      1.00      1.00        14

No description has been provided for this image
In [ ]:
RandForest(0.6, "Materials")
Accuracy: 0.9166666666666666
Precision: 0.8571428571428571
Recall: 1.0
F1-Score: 0.9230769230769231
Confusion Matrix:
[[5 1]
 [0 6]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.83      0.91         6
           1       0.86      1.00      0.92         6

    accuracy                           0.92        12
   macro avg       0.93      0.92      0.92        12
weighted avg       0.93      0.92      0.92        12

No description has been provided for this image
In [ ]:
RandForest(0.4, "Energy")
Accuracy: 0.875
Precision: 1.0
Recall: 0.75
F1-Score: 0.8571428571428571
Confusion Matrix:
[[4 0]
 [1 3]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      1.00      0.89         4
           1       1.00      0.75      0.86         4

    accuracy                           0.88         8
   macro avg       0.90      0.88      0.87         8
weighted avg       0.90      0.88      0.87         8

No description has been provided for this image

Stochastic Gradient Descent¶

Lastly, our case for using a Stochastic Gradient Descent was the strongest of the models for multiple reasons. The most notable being: SGD’s efficiency with larger data sets SGD’s ability to be updated as new data arrives, which would future-proof the model to some extent if any new stock market/economic factor data gets added

In [ ]:
def SGDReg(size, sector):
    X = sectors_gdp_unemp_df.drop(columns=[sector])
    y = sectors_gdp_unemp_df[sector]

    # Split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=size, random_state=42)

    # Normalize
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Train
    sgd_reg = SGDRegressor(max_iter=100000, tol=1e-3, random_state=42)
    sgd_reg.fit(X_train_scaled, y_train)

   # Predict and evaluate
    y_pred = sgd_reg.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)
    print(f'Mean Squared Error: {mse}')
    
    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred, edgecolor='k', alpha=0.7)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title('Predicted vs. Actual Values for ' + sector )
    plt.grid(True)
    plt.show()
In [ ]:
SGDReg(0.3, "Consumer Discretionary")
Mean Squared Error: 7.098322600123848
No description has been provided for this image
In [ ]:
SGDReg(0.45, "Real Estate")
Mean Squared Error: 9.264085113648346
No description has been provided for this image
In [ ]:
SGDReg(0.2, "Health Care")
Mean Squared Error: 5.91806112583378
No description has been provided for this image
In [ ]:
SGDReg(0.3, "Industrials")
Mean Squared Error: 2.96630489113503
No description has been provided for this image
In [ ]:
SGDReg(0.4, "Information Technology") 
Mean Squared Error: 2.110729967124219
No description has been provided for this image
In [ ]:
#Energy Stochastic Gradient Descent
SGDReg(0.4, "Energy") 
Mean Squared Error: 4.961598180501632
No description has been provided for this image
In [ ]:
SGDReg(0.3, "Materials")
Mean Squared Error: 1.6855093502619864
No description has been provided for this image

Interpretation: Insight & Policy Decision¶

While conducting our data analysis we came to the conclusion that there are various factors which influence the growth of certain industries stocks more than others. As both our analysis and predictions show, the sectors – Consumer Discretionary, Real Estate, Health Care, Industrials, and Information Technology – are affected more by unemployment while sectors – Energy, Materials, and Industrials – are affected more by GDP. This also relates to our initial idea that unemployment will have an inverse relationship with the sectors it affects while GDP has a linear or same moving trend as the sectors it affects.

We also can identify that we didn't incorporate 4 sectors into our analysis. This was because those sectors were not very affected by the economic evaluations we were looking at. For example, financials are more affected by interest rates compared to GDP and unemployment. Now, this doesn’t mean there’s no relationship between those sectors and our economic indicators. This just shows that the small relationship can be considered statistically insignificant since there's either a slight or no change for each of those sectors.

What we can get from this complete analysis and prediction is that although we can see the S&P 500 moving in a rather positive and growing direction, economic factors, especially dramatic ones, can have long term effects. We can also conclude that it is important to look at other factors, as our Energy data showed a skew because of unforeseen and non predictable events which happened during the timeline of which our data was collected. Although interest rates would have been a relevant factor here, we didn’t include them in our analysis because we wanted to show how one factor can affect the stock prices, so by removing a larger economic factor we were able to easily show this.